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The  use  of  statistical  models  for  forecasting  and  economic  control 
has  received  widespread  attention  in  recent  years.  Most  of  this  atten- 
tion has  been  focused  on  the  problems  caused  by  uncertainty  concerning 
the  parameters  of  a  given  model,  whereas  little  attention  has  been  paid 
to  the  problems  caused  by  uncertainty  concerning  the  specification  of 
the  model  itself.   In  this  dissertation  Bayesian  methodology  is  employed 
to  treat  model  specification  uncertainty  in  forecasting  and  control 
environments.  The  implications  of  forecasting  with  and  without  formal 
regard  for  model  specification  uncertainty  are  explored  via  a  comparison 
of  the  recommended  methodology  and  alternative  methods  which  involve  the 
selection  of  a  single  model.  The  recommended  methodology  is  applied  to 
single-period  economic  control  problems.  In  particular,  certainty- 
equivalent  and  optimal  analytic  solutions  are  found  for  problems  in 
which  there  exist  two  viable  alternative  linear  models  of  the  data- 
generating  process  each  with  a  different  instrument  and  no  intercept 
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term.  Solutions  are  obtained  for  situations  in  which  control  is 
cost-free  and  in  which  various  instrument-use  cost  functions  are  known. 
Finally,  a  Bayesian  procedure  for  modeling  and  making  inferences  about 
particular  nonstationary  data-generating  processes  is  introduced.  This 
procedure  characterizes  data  as  being  generated  by  different  statistical 
models  in  different  time  periods  with  the  switch  between  models  con- 
trolled by  some  random  process. 
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CHAPTER  I 
INTRODUCTION 

The  use  of  statistical  models  for  forecasting  and  economic 
control  has  recieved  widespread  attention  in  recent  years.  Most  of 
this  attention  has  been  focused  on  the  problems  caused  by  uncertainty 
concerning  the  parameters  of  a  given  model.  As  a  result,  much  has 
been  written  about  parameter  specification  and  estimation  and  their 
decision-making  implications,  whereas  little  analytical  attention 
has  been  paid  to  the  problems  caused  by  uncertainty  concerning  the 
specification  of  the  model  itself.  The  implications  of  this  type  of 
uncertainty  for  forecasting  and  control  are  virtually  unexplored.  That 
these  implications  are  significant  and  worth  exploring  has  been  ex- 
pressed by  Pierce: 

Another  area  of  uncertainty  has  to  do  with  our  models.  .  .  . 
The  problem  lies  not  only  with  uncertainty  concerning 
the  true  value  of  model  parameters,  but  also  with  the 
structure  of  models  themselves.  ...  We  have  found 
that  with  some  relatively  minor  changes  in  the  specifi- 
cation of  our  quarterly  model.  .  .  we  can  importantly 
alter  its  policy  multipliers.1 


J.  L.  Pierce,  "Quantitative  Analysis  for  Decisions  at  the  Federal 
Reserve,"  Annals  of  Economic  and  Social  Measurement,  3^  (1974),  1-9. 


In  this  dissertation  the  Bayesian  Model   Comparison  procedure 
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developed  by  Geisel      from  the  work  of  Roberts     is  advocated  as  a  method 

for  formally  treating  model    specification  uncertainty  in  forecasting 
and  control   problems.     The  implications   for  forecasting  with  and 
without  regard  for  model    specification   uncertainty  are  examined,  and 
the  Bayesian  Model   Comparison  procedure  is  applied  to  simple  single- 
period  economic  control   problems. 

The  following  sections  of  this  chapter  introduce  definitions  and 
discuss  concepts  that  will   be  referred  to  throughout  the  remainder 
of  the  dissertation. 

1.1     Statistical   Models 
Throughout  this  dissertation  the  term  "model"   refers  to  a  para- 
metric statistical   characterization  of  a  data-generating  process 
composed  of  both  deterministic  and  random  components.     The  general 
linear  model    used  in   regression  analysis  is  an  example  of  such  a  charac- 
terization.     Each  such  model    describes  the  data-generating  process  via 
a  family  of  probability  density  functions   in  which  each  member  of  the 
family  depends  on  a  finite  number  of  parameters,  probability  density 
functions  over  the  parameters,   and  predetermined  values  of  a  specified 
set  of  variables  upon  which  it  has  been  hypothesized  that  the  data- 
generating  process  depends. 


Martin   S.    Geisel,   "Comparing  and  Choosing  Among  Parametric  Sta- 
tistical  Models:     A  Bayesian  Analysis  with  Macroeconomic  Applications" 
(Ph.D.   dissertation,   University  of  Chicago:     1971). 

Harry  V.    Roberts,   "Probabilistic  Prediction,"  Journal   of  the 
American  Statistical   Association,  60   (March,   1965),  50-62. 


Statistical  models  are  used  to  describe  the  stochastic  behavior 
of  a  data-generating  process.  Decision  makers  use  them  "as  if"  they 
were  actually  generating  the  data  of  interest.  Any  reference  to  a 
model  as  being  the  "true"  or  "correct"  model  of  a  data-generating 
process  should  not  be  taken  literally.  A  model  is  referred  to  as 
being  the  "true"  model  only  insofar  as  it  behaves  "as  if"  it  were 
generating  the  observed  data. 

1.2  Model  Specification  Uncertainty 
The  statistical  models  discussed  in  the  previous  section  ex- 
plicitly admit  uncertainty  about  the  data-generating  process  through 
their  parameters  and  random  error  terms.  These  two  sources  of  uncer- 
tainty will  be  referred  to  as  parameter  uncertainty  and  random  error 
(or  residual)  uncertainty.  Random  error  is  present  in  each  model 
since  the  deterministic  component  of  the  model  cannot  realistically 
be  expected  to  account  for  all  factors  influencing  a  realization  of 
the  data-generating  process.  Parameter  uncertainty  is  present  since 
a  model's  parameters  are  typically  not  observable  and  must  be  estimated 
from  sample  data.  Being  explicitly  present  in  a  statistical  model, 
these  two  types  of  uncertainty  and  their  implications  for  decision 
making  have  received  considerable  attention  in  the  literature. 
Thus,  it  is  well-known  that  the  appropriate  use  of  a  statistical  model 
in  decision  making  requires  the  consideration  and  treatment  of  both 
parameter  and  random  error  uncertainty. 


References  are  provided  in  Chapters  III  and  IV. 
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When  a  decision  maker  is  uncertain  as  to  the  functional  form  of 
his  model  and/or  is  uncertain  as  to  the  set  of  variables  upon  which 
the  data-generating  process  depends,  model  specification  uncertainty 
is  said  to  be  present.  Model  specification  uncertainty  and  its  decision- 
making implications  have  received  little  attention  in  the  literature. 
As  a  result,  model  specification  uncertainty  is  typically  ignored  or 
assumed  away  in  the  statistical  analysis  of  data-generating  processes 
that  precedes  decision-making.   The  usual  procedure  is  for  the  decision- 
maker to  utilize  sample  information  to  aid  in  the  selection  of  a  model 
from  a  set  of  models  he  believes  to  be  viable  alternative  representa- 
tions of  the  data-generating  process.  The  chosen  model  is  then  assumed 
to  appropriately  represent  the  data-generating  process, and  the  decision 
maker  bases  his  decisions  on  the  information  provided  by  this  model. 
Such  a  procedure  can  formally  consider  only  parameter  and  random  error 
uncertainty.  Depending  on  the  particular  model  selection  procedure 
utilized,  model  specification  uncertainty  is  either  completely  ignored 
or  suboptimally  treated.  The  result  is  that  some  or  all  of  the  informa- 
tion provided  about  the  data-generating  process  by  the  set  of  models 
which  were  not  chosen,  but  were  believed  to  be  viable,  is  lost.  This 
loss  is  analogous  to  the  information  loss  that  would  occur  if  the  deci- 
sion maker  assumed  he  knew  the  parameters  of  a  given  model  and  made  his 
decisions  without  acknowledging  parameter  uncertainty.  Chapters  III  and 
IV  will  discuss  in  detail  the  decision-making  implications  of  the  infor- 
mation loss  caused  by  failing  to  treat  model  specification  uncertainty. 


An  interesting  exception  is  the  recent  paper  by  M.  Brenner,  "The 
Effect  of  Model  Misspecification  on  Tests  of  the  Efficient  Market  Hypo- 
thesis," Journal  of  Finance,  32_  (1977),  57-66.  There  are  other  excep- 
tions as  wel 1 . 


1.3     The  Bayesian  Approach  to   Inference  and  Decision  Making 
In  this  dissertation  uncertainty  is  dealt  with  via  Bayesian 
inferential   procedures.     This  section  briefly  reviews  the  methodology 
of  Bayesian   inference. 

1.3.1     The  Predictive  Distribution 

Decisions  frequently  hinge  on  the  future  outcome  of  a  data- 
generating  process.     In  such  cases  decision  makers  typically  use  a 
statistical   model   to  characterize  the  data-generating  process.     If 
model    specification  uncertainty  is  negligible  and  the  parameters  of 
the  model   are  known,   then  the  decision  maker  can  feel    secure  in  basing 
his  decision  on  the   information  provided  him  by  his  model.     However,   if 
the  parameters  of  the  model   are  unknown  the  model    should  be  altered  to 
reflect  the  decision-maker's   uncertainty  concerning  the  parameters. 
This  can  be  accomplished  by  treating  the  parameters  as  random  variables, 
utilizing  a  probability  distribution  over  the  parameters  to  reflect  the 
decision-maker's  parameter  uncertainty,  and  computing  the  marginal 
distribution  of  future  realizations  of  the  data-generating  process, 
i.e.,   the  distribution  of  future  realizations  which  is  not  conditioned 
on  the  model's  parameters. 

Suppose  the  decision-maker's  statistical   model    describes  the  data- 
generating  process  via  the  sampling  distribution  f(yF|e),  where  yF  is 
a  future  value  of  the  data-generating  process   (yF  e  Y)  and  the 


For  a  thorough  discussion  of  methodology  reviewed  in  this   section 
see  Howard  Raiffa  and  Robert  Schlaifer,  Applied  Statistical    Decision 
Theory   (Cambridge, Mass.:  The  M.I.T.   Press,   1961). 


parameters  of  the  data-generating  process  are  represented  by 
0   (e  e  0).     Then,   if  the  decision-maker's  parameter  uncertainty  can  be 
described  by  a  probability  distribution  g'(e),  the  decision  maker  can 
compute  the  marginal   distribution  of  future  realizations  of  the  data- 
generating  process  as  follows: 

f(yF)  =  /  g'(e)f(yF|e)de.  (1.1) 

0 

This  distribution  is  referred  to  as  a  predictive  distribution. 

If  the  decision  maker  is  able  to  obtain  a  sample  from  the  data- 
generating  process  of  interest  he  may  update  his  distribution  of  e  to 
reflect  the  sample  information.  Then,  utilizing  his  revised  distri- 
bution of  6,  he  may  recompute  his  predictive  distribution  of  yF  so 
that  it  too  reflects  the  sample  information.  The  revision  of  f(e)  is 
accomplished  via  Bayes'  Rule: 

r^ivi  -     q'(9)f(yl9)  n  ?\ 

f(ely)  "  i  g'<e)f(y|e)de     ■  (K2) 

0 

The  function  g'(e)   is  called  the  decision-maker's  prior  distribution  of 
6  since   it  was  established  prior  to  obtaining  the  sample  y.     The 
function   f(y|e)   is  a  likelihood     function.      It  describes  the  likelihood 
of  the  given   sample  result,   y,   for  different  values  of  9.     The  function 
f"(e|y)   is  the  decision-maker's  revised  distribution  of  e.     It  is  called 
a   posterior  distribution  since  it  was  computed  following  the  receipt  of 
sample  information.     The  posterior  distribution  reflects  all    the  infor- 
mation about  e  currently  available  to  the  decision  maker.     This   infor- 
mation may  be  incorporated  into  his  predictive  distribution  of  yp  as 
follows: 


f(yF|y)  =  /  f"(e|y)f(yF|e)de.  (1.3) 

9 
It  is  from  this  distribution  that  needed  information  about  future 
observations  of  the  data-generating  process  should  be  extracted.  As 
more  sample  and/or  subjective  information  about  the  process  becomes 
available,  the  decision  maker  can  formally  revise  his  predictive  dis- 
tribution to  reflect  that  information  by  repeatedly  applying  the 
above  procedure. 

1.3.2  The  Posterior  Distribution 

There  are  three  inputs  to  Bayes'  Rule:   (1)  the  decision-maker's 
prior  information  about  e  expressed  via  g'(e);  (2)  sample  observations 
from  the  data-generating  process;  and  (3)  the  choice  of  the  functional 
form  of  the  data-generating  process,  i.e.,  the  choice  of  a  likelihood 
function.  The  output  of  Bayes'  Rule  is  an  inferential  statement 
about  6  in  the  form  of  a  probability  distribution,  f"(e|y).  A 
decision  maker    interested  in  obtaining  information  about  a  param- 
eter of  the  data-generating  process     should  compute  f"(e|y).  The 
function  f"(e|y)  can  stand  alone  as  an  inferential  statement  about  e, 
or  it  can  be  used  to  determine  point  and  interval  estimates  of  e.  As 
more  sample  and/or  subjective  information  about  the  data-generating 
process  becomes  available,  Bayes'  Rule  can  be  reapplied  to  revise 
f"(e|y).  The  sequential  application  of  Bayes'  Rule  permits  the 
decision  maker  to  formally  learn  about  e  over  time. 

1.4  Chapter  Outline  and  Preview  of  Results 
Typically  econometric  forecasting  and  control  models  are  developed 
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and  used  without  formally  considering  the  full  impact  of  model  specifi- 
cation uncertainty.  The  usual  procedure  is  to  (1)  utilize  a  model  se- 
lection technique  to  choose  one  model  from  a  set  of  alternative  compe- 
ting models  to  characterize  the  data-generating  process,  and  (2)  assume 
the  chosen  model  to  be  the  correct  model  of  the  data-generating  process 
and  use  it  to  forecast  and/or  control  the  process.  Such  procedures 
either  ignore  or  do  not  fully  consider  the  information  about  the  data- 
generating  process  contributed  by  the  models  that  were  proposed  as 
being  viable  but  were  not  selected  by  the  model  selection  procedure. 
Further,  in  assuming  the  chosen  model  is  the  correct  model  of  the 
process,  the  forecaster  or  controller  is  behaving  as  though  he  faces  a 
lesser  degree  of  uncertainty  than  is  really  the  case.  Thus,  in  utilizing 
model  selection  procedures,  forecasters  and  controllers  are  simultane- 
ously discarding  relevant  information  about  the  data-generating  process 
and  behaving  as  if  they  have  more  information  than  is  actually  possessed. 
This  dissertation  advocates  the  use  of  the  Roberts/Geisel  Bayesian 
Model  Comparison  Procedure  as  a  means  of  comprehensively  treating  model 
specification  uncertainty  and  avoiding  such  contradictory  behavior.  The 
Bayesian  Model  Comparison  Procedure  and  its  origins  are  described  in 
Chapter  II.  Chapter  II  also  describes  a  Bayesian  model  selection  pro- 
cedure referred  to  herein  as  the  Bayesian  Model  Selection  Procedure. 

In  Chapter  III,  the  effects  of  forecasting  with  and  without  regard 
for  model  specification  uncertainty  are  examined  by  comparing  forecasts 
determined  via  the  Bayesian  Model  Comparison  procedure  (BMC)  with  those 
yielded  by  a  Bayesian  procedure  which  fails  to  appropriately  consider 


model  specification  uncertainty,  the  Bayesian  Model  Selection  procedure 
(BMS).  The  following  results  are  derived: 

1.  If  the  variance  of  the  decision-maker's  predictive  distribution 
is  used  to  measure  forecast-risk,  and  a  decision  maker  fore- 
casts via  the  BMS  procedure  rather  than  the  BMC  procedure, 

the  risk  he  takes  in  predicting  future  values  of  the  data- 
generating  process  is  misspecified. 

2.  A  decision-maker's  posterior  expected  loss  from  using  a  fore- 
cast derived  via  the  BMC  procedure  is  less  than  his  posterior 
expected  loss  from  forecasting  via  the  BMS  procedure. 

3.  Point  estimates  derived  via  BMS  are  frequently  misplaced. 

4.  The  reliability  of  credible  intervals  derived  via  the  BMS 
procedure  may  be  misspecified. 

In  Chapter  IV,  the  BMC  procedure  is  applied  to  simple  single- 
period  economic  control  problems.   In  particular,  certainty-equivalent 
and  optimal  analytic  solutions  are  found  for  the  case  of  two  competing 
linear  models  each  with  a  different  instrument  (controllable  variable) 
and  no  intercept  term.  The  following  results  are  obtained: 

1.  The  BMC  certainty-equivalent  control  solution  is  to  set  both 
instruments  as  if  each  instrument's  respective  model  were  in 
fact  the  true  model  of  the  data-generating  process. 

2.  If  the  variance  of  the  controller's  predictive  distribution 
is  used  to  measure  control-risk,  and  certainty-equivalent 
control  is  utilized,  it  can  be  shown  that  under  certain 
circumstances  the  BMS  approach  to  control  always  understates 
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the  control-risk  involved. 
3.  The  optimal  BMC  control  solution  is  to  set  both  instruments  as 
if  each  instrument's  respective  model  were  in  fact  the  true 
model  of  the  process.  Since  optimal  BMC  control  treats  model 
specification  uncertainty,  parameter  uncertainty,  and  residual 
uncertainty,  whereas  certainty-equivalent  control  treats  only 
model  specification  uncertainty,  the  optimal  BMC  control  solu- 
tion differs  from  the  BMC  certainty-equivalent  control  solu- 
tion. 

Certainty-equivalent  and  optimal  BMC  control  solutions  for  cases  where 
instrument  use  costs  are  known  are  also  derived  in  Chapter  IV. 

In  Chapter  V,  a  procedure  for  handling  model  nonstationarity  is 
introduced.  Called  Bayesian  Model  Switching,  this  procedure  was 
suggested  by  anomalies  observed  in  sequences  of  posterior  model  proba- 
bilities generated  by  the  BMS  and  BMC  procedures.  The  Bayesian  Model 
Switching  procedure  characterizes  the  data-generating  process  in  a 
manner  similar  to  Quandt's  switching  regression  regimes. 

Chapter  VI  contains  an  overview  of  the  dissertation,  a  discussion 
of  the  shortcomings  of  the  Bayesian  Model  Comparison  and  Bayesian 
Model  Switching  procedures,  and  suggestions  for  future  work  in  the 
area  of  model  specification  uncertainty. 


R.  E.  Quandt,  "A  New  Approach  to  Estimating  Switching  Regressions,1 
Journal  of  the  American  Statistical  Association,  67  (March,  1972), 
306-310. 


CHAPTER  II 

HYPOTHESIS  TESTING,  BAYESIAN  MODEL  SELECTION,  AND 
BAYESIAN  MODEL  COMPARISON 


The  Bayesian  Model  Comparison  approach  to  handling  model  specifi- 
cation uncertainty  in  decision-making  problems  has  its  origins  in  the 

hypothesis  testing  work  of  Harold  Jeffreys  and  is  a  direct  spin-off  of 

2 
a  Bayesian  procedure  developed  by  Harry  Roberts  for  combining  expert 

3 
opinions.  Martin  Geisel  adapted  Roberts'  work  for  use  in  econometrics 

and  in  so  doing  formalized  the  Bayesian  Model  Comparison  and  Bayesian 

Model  Selection  procedures.  The  contributions  of  Jeffreys,  Roberts, 

and  Geisel  to  the  existing  Bayesian  Model  Comparison  and  Bayesian  Model 

Selection  procedures  are  discussed  in  this  chapter. 

4 
II. 1  Harold  Jeffreys:  Hypothesis  Testing 

In  considering  two  mutually  exclusive  and  exhaustive  hypotheses 
about  the  parameter  vector  8  of  a  probability  density  function, 
Jeffreys  suggests  that  the  decision  maker  should  place  prior  probability 
masses  on  each  of  the  hypotheses.  The  probabilities  should  be  con- 
sistent with  the  decision  maker's  prior  information  and,  consequently, 


Harold  Jeffreys,  Theory  of  Probability  (London:  Oxford  University 
Press,  1961),  Chapters  4  and  5. 

2Roberts,  pp.  50-62. 

3Geisel,  pp.  1-45. 
4 
Jeffreys,  Chapters  4  and  5. 
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prior  beliefs  about  the  appropriateness  of  each  of  the  hypotheses. 

Thus,  if  the  two  hypotheses  H  and  H-,  are  exhaustive  and  nonoverlapping 

their  prior  probabilities  P'(H  )  and  P'(H, )  would  be  assessed,  and  must 

sum  to  one.  If  H  and  H,  are  a  priori  equally  1  ikely,  P'  (H  )  =  P'  (H-, ). 

It  is  assumed  that  given  H  a  future  sample  result  y  has  probability 

density  function  f(y|H  ),  and  that  given  H,  is  true,  y's  probability 

density  function  is  f(y|H, ).  Then,  using  Bayes'  Rule,  the  posterior 

probability  that,  say,  H  is  the  appropriate  hypothesis  is 

P'(H0)f(y|HQ) 
P"(Holy)  =  P'(H0)f(y|Ho)  +  P,(H1)f(y|H1)  '       (2J) 

and  P'^H^y)  =  1-P"(H  |y).  After  determining  P" (HQ]y)  and  P'^H^y), 

the  decision  maker  can  choose  as  the  more  appropriate  hypothesis  the  one 

with  the  higher  posterior  probability.  Or,  if  the  decision  maker  can 

economically  determine  the  losses  involved  from  choosing  an  incorrect 

hypothesis,  he  can  use  P"  (HQ | y)  and  P"  (H-.  |y )  to  determine  the  expected 

loss  of  choosing  H  or  H,  and  then  select  as  being  the  more  appropriate 
3  o     1 

the  hypothesis  that  minimizes  his  expected  loss. 

More  formally,  if  H  is  e  =  eQ  and  H-j  is  9  =  6-j ,  where  QQ   and  e1 
are  particular  values  of  the  parameter  vector   (i.e.,  two  simple 
hypotheses),  then  (2.1)  would  be 

P'(e=e  )f(y|e=e  )  (2  2) 

P"(H0|y)  =  P"(eA|y)  =  P»  (e=e0)f  (y 1 6=eQ)  +  P1  (e^  )f(y  19=8-, )  ■ 

If  H  is  9  e  *■.  and  H,  is  9  e  f2  where  v^  and  4»2  (^  U  ^2  =  *)  are 
mutually  exclusive  and  exhaustive  sets  (i.e.,  HQ  and  H-,  are  two 
composite  hypotheses),  then  it  is  necessary  for  the  decision  maker  to 
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assess  a  prior  pdf  for  e  over  «?-,,  P'(  8|0e¥,)s  and  another  for  0  over'^: 


P '  ( e | ee^p ) .  Then  (2.1)  would  be 

P'(ee¥,)f(y|ee?,) 
P-Oi^y)  =  PMee^ly)  ■  ^ (2.3) 

where  f  (y | ee^-, )  =     /  P'  (e  leev,  )f  (y  |e,  eeiOde 

fl 

2 
and  f(y)  =     7  P1  (BeV.  )f  (y  |  Be*.  ) . 
1=1  n  n 

The  next  section  discusses  Harry  Roberts  important  extension  of 

Jeffreys'  work. 

II. 2  Harry  V.  Roberts:  Comparing  Forecasters 

Roberts  was  concerned  with  "reconciling  conflicting  expert  inter- 

2 
pretations  of  the  same  data."   Building  on  Jeffrey  s  work,  Roberts 

devised  a  method  for  discriminating  among  a  set  of  alternative  para- 
metric statistical  models  each  of  which  purports  to  describe  some 
random  process  of  interest.  This  Bayesian  discrimination  procedure 
will  be  discussed  in  detail  in  the  next  section. 

It  will  be  assumed  that  person  C  knows  nothing  about  a  particular 
data-generating  process  f(y|e),  but  wishes,  for  example,  to  predict 
future  y  values  and  is,  therefore,  interested  in  learning  about  the 
process.  Persons  A  and  B  possess  knowledge  about  the  same  process. 
A  and  B  express  their  knowledge  about  f(y|e)  via  the  data  distributions 


Roberts,  pp.  50-62. 
2Ibid.,  p.  55. 
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f(y|e,  A)  and  f(y|e,  B),  respectively,  and  their  prior  distributions  on 
the  parameter  e,  g(e | A)  and  g(e|B).  e  may  be  a  vector.  For  expository 
purposes,  only  two  individuals  will  be  assumed  to  possess  knowledge 
about  the  process,  and  all  probability  distributions  of  this  section 
will  be  assumed  to  be  discrete. 

C's  prior  distribution  for  the  parameter  e  may  be  expressed  as: 

g'(e)  =  P'(A)g(e|A)  +  P'(B)g(e|B).  (2.4) 

P'(A)  and  P'(B)  sum  to  one  and  may  be  thought  of  as  C's  probability 
assessment  of  the  accuracy  of  A's  judgment  and  B's  judgment,  respec- 
tively. If  C  had  some  knowledge  about  the  reliability  of  opinions 
expressed  by  A  and  B  he  might  tend  to  respect  the  opinion  of  one, 
say  A,  more  than  the  other,  and  so  assign  P'(A)  >  P'(B).  If  C  knew 
nothing  about  either  A  or  B  it  would  be  appropriate  for  him  to  assess 
P'(A)  =  P'(B)  =  .5.  C  can  then  learn  about  f(y|e)  by  combining  his 
thoughts  (if  any)  about  A  and  B  (reflected  in  P'(A)  and  P'(B))with  the 
opinions  expressed  by  A  and  B  about  f (y | e)  (represented  by  g(e|A), 
f(y|e,  A),  g(e|B),  and  f(y|e,  B))  as  in  (2.4),  and  by  using  sample 
information  to  revise  (2.4).  Thus  it  is  C's  posterior  distribution  of 
9»  g"(9|y),  that  C  should  use  in  predicting  y.  Roberts'  development  of 
g"(e|y)  is  outlined  in  the  next  paragraph. 


For  another  approach  to  the  use  of  expert  opinion  see  Peter 
Morris,  "Decision  Analysis  Expert  Use,"  Management  Science,  20  (May, 
1974),  1233-41  and  "Combining  Expert  Judgments:  A  Bayesian  Approach," 
Management  Science, 23  (March,  1977),  679-693. 
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Following  Roberts,  let  A  index  the  opinions  of  A  and  B, i.e.,  when 
A  =  XA  reference  is  being  made  to  person  A,  and  when  A  =  AR  reference 
is  being  made  to  person  B.  With  C's  prior  distribution  for  A  denoted 
by  P '  ( x ) ,  C's  joint  prior  distribution  for  a  and  e  is  denoted: 

g'(e,A)  =  P'(A)g(9|A).  (2.5) 

Accordingly,  C's  marginal  prior  distribution  for  9  is  denoted  by 

P'(e)  =  lP'(A)g(e|A).  (2.6) 

A 

Equations  (2.6)  and  (2.4)  are  equivalent.  C's  joint  posterior  distri- 
bution of  A  and  6  is  obtained  via  Bayes'  Rule  as  follows: 

n"(x  e|y)  =   h'  ( A,e)-F(y j x,e)   =  P'  (A)g'  (el  A)f(y| x,e)      (2  7) 
H   h'(A,e)f(ylA,e)         f(y) 

A9 
f(y|x,e)  represents  the  likelihood  of  observing  the  sample  result  y 
given  particular  values  for  A  and  9.  f(y)  is  the  marginal  distribution 
of  the  data.  Then,  recognizing  that  g(e | A)f(y | A,e)  =  f(y|x)g(e|y,x), 
g"(e|y)  is  obtained  from  (2.7)  as  follows: 

9"(ely)  =  IP'(A|?;S9'A)  f(y|x,y)         (2.8) 

v  P'(x)f(yU)  ,„,,  , 
=  I        f(yY        g(8|x,y) 

A 

=  I   P(My)g(e|A,y). 

A 
Thus  C's  posterior  distribution  of  e  is  a  weighted  average  of  A's 
posterior  distribution  of  9  and  B's  posterior  distribution  of  9. 

Roberts  points  out  that  if  (2.7)  is  summed  over  e  instead  of  A, 
as  was  done  in  (2.8),  the  marginal  posterior  distribution  of  A  is 
obtained: 


16 


R(x|y)  ■  P'%\yM   ■  (2-9) 


Roberts  notes  that  in  statistical  discrimination  problems  where  it  is 
assumed  that  y  is  generated  by  either  f(y|x»)  or  f(y|AR),  P'(a.)  (i=A,B) 
may  be  interpreted  as  the  discriminator's  prior  probability  that  f(y|x.) 
generates  y,  and  P( X -  |y )  may  be  interpreted  as  the  discriminator's 
posterior  probability  that  f ( y j  a  . )  generates  y.  Roberts  suggests  that 
discrimination  between  these  two  alternative  generating  processes  should 
be  accomplished  via  examination  of  the  posterior  odds  ratio,  ,  , — r-  . 

Roberts'  interpretation  of  P(X . |y) ,  and  his  suggested  procedure  for 
discriminating  among  alternative  statistical  models,  were  formalized  by 
Geisel  in  his  Bayesian,  Model  Selection  and  Bayesian  Model  Comparison 
schemes.  Geisel 's  extension  of  Roberts'  work  is  discussed  in  the  next 
section. 

II. 3  Martin  S.  Geisel:  Bayesian  Model  Comparison  and  Selection 

Geisel 's  work  was  concerned  with  Bayesian  procedures  for  comparing 
and  choosing  among  parametric  statistical  models.  His  procedure  for 
comparing  models  will  be  referred  to  as  the  Bayesian  Model  Comparison 
(BMC)  approach.  His  procedure  for  choosing  one  model  from  among  a  set 
of  competing  models  uses  the  same  methodology  as  the  BMC  approach  but 
for  different  purposes.  Consequently,  the  latter  procedure  will  be 
referred  to  here  as  the  Bayesian  Model  Selection  (BMS)  procedure. 


Geisel ,  pp.  1-45. 
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Suppose  the  decision  maker  feels  that  any  one  of  N  alternative 
models  could  represent  the  data-generating  process  of  interest  to  him. 
Denote      by     P'(M.),   i=l,2,...,N  the  decision-maker's  prior  proba- 
bility that  M-,   the  ith  model,   is  an  accurate  representation  of  the 
data-generating  process.      If  the  decision  maker  assesses  P'(M..)   >  0, 
then   the  model   should  be  included  in  the  set  of  N  models.      It  follows 

N 
that     I  P'(M.)  =  1.     The  unknown  vector  of  parameters  of  M..    is  denoted 
i=l        "" 

by  9-,   i=l,...,N  where  6.   e  0.     The  decision  maker's  knowledge  about 
0.    is  described  via  a  prior  density  function,   g'(8..jM..). 

If  M.   were  known  to  be  the  true  model   and  its  parameters  were 
known  to  te  9°.   the  data-generating  process  could  be  completely  charac- 
terized by  the  density  function   f (y | e? ,  M. ,   0.),  where  y  is  the  random 
variable  of  interest  to  the  decision  maker.        In  the  forecasting 
problems  of  Chapter  III,   D-,  which  may  be  a  vector,  will   be  the 
explanatory  variables  of  M.   and  will   be  used  to  help  forecast  future 
values  of  y.      In  the  economic  control   problems  of  Chapter   IV,   D^   will 
be  the   independent  variables  of  M.   and  will    be  under  the  control   of  the 
decision  maker.     Once  y  has  been  observed,   f (y | e . ,   M. ,   D . ) ,   viewed  as 
a  function  of  9.,  M.   and  D-,   is  a  likelihood  function   and  can  be  used 
to  make  inferences  from  the  data  about  the  correct  model   and  about  the 
parameters  of  all   the  models. 


y  may  be  vector-valued,  but  in  order  to  simplify  the  notation 
and  discussion  to  follow  it  is  assumed  that  y  is  a  scalar. 
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As  new  information   is  received  about  the  data-generating  process 
being  modeled,   i.e.,   as  y  is  observed,  the  prior  distribution  on  the 
parameters  of  M.   should  be  revised  to  reflect  this  new  information. 
Revision  for  a  single  model    is  accomplished  exactly  as  if  the  parame- 
ter distribution  of  a  known  data-generating  process  with  unknown 
parameters  were  being  revised.     Applying  Bayes'    Rule  yields 

g'^e^M^y.D.)  =  g'(e.  |Mi)f(y|e.sMi,D.)/f(y|M.,Di)  (2.10) 

where 

f(y|rVDi)  =  /  g'(eilMi)f(ylei>Mi>Di)dv  (2-11) 

The  function  g"(e. |M. ,y,D, )   is  the  posterior  distribution  of  e^ . 

Given  M,   and  D, ,  and  before  observing  y,   f (y |Mi  ,D1- )   is  commonly 

called  the  predictive  density  function  of  y.     It  is  the  distribution 

of  future  realizations  of  the  data-generating  process  conditioned  on 

M.   being  the  correct  model   of  the  process  and  unconditioned  on  e., 

the  parameters  of  M..     Having  observed  y,   f (y |M- ,D.)  may  be  thought 

of  as  a  "model    likelihood"   since  it  compares  the  relative  likelihood 

of  the  data,  y,  across  models.     Utilizing  these  model    likelihoods 

Bayes'    Rule   is  invoked  a  second  time  to  revise  the  prior  model 

probabilities: 

P"(M.|y,D)   =  P,(M.)f(y|M.,D.)/f(y|D)  (2.12) 

where 

N 
f(y|D)   =     I  P'(M.)f(y|M.,D.).  (2.13) 

P"(M. |y,D)  is  the  posterior  probability  that  M,  is  the  correct  model. 
f(y|D)  is  a  predictive  distribution,  a  distribution  of  future 
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realizations  of  the  data  unconditioned  on  a  particular  model  being 
the  correct  model.  D,  written  without  a  subscript,  is  a  vector 
comprised  of  the  set  of  decision  variables,  D.,  i=l,2,...,N  from  all 
N  models. 

After  observing  y  and  revising  the  prior  distributions  on  M. 
and  9.,  the  posterior  probability  distributions  reflect  all  the 
information  the  decision  maker  has  about  the  set  of  models  and  their 
parameters.  Any  prior  information  is  reflected  in  the  prior  distri- 
butions, P'(M- )  and  g'(e.|M. ).  The  sample  evidence,  y,  is  incorpo- 
rated through  the  likelihood  function,  f(y |M. ,e . ,D. ) .  As  additional 
information  in  the  form  of  further  observations  of  y  is  obtained,  it 
may  be  reflected  in  new  posterior  distributions  that  are  obtainable 
via  revision  of  the  existing  posteriors  (which,  relative  to  the  latest 
data,  are  called  priors)  derived  in  (2.10)  and  (2.12)  above. 

As  long  as  the  data-generating  process  does  not  change  over  time, 
the  application  of  (2.10)  and  (2.12)  to  successive  sets  of  new  data 
permits  the  decision  maker  to  "learn  from  experience"  about  which 
model  of  the  process  is  the  most  appropriate.  When  the  data  may  be 
generated  by  different  models  in  different  time  periods,  successive 
application  of  the  probability  revision  procedures  in  this  section 
would  be  inappropriate.  This  problem  and  an  approach  to  handling  it 
are  discussed  in  Chapter  V. 

The  above  procedure  can  be  used  to  select  a  single  model  to 
represent  a  random  process  by  a  decision  maker  who  is  uncertain  about 
the  appropriate  form  of  that  process.  He  can  accomplish  this  by 
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choosing  from  his  original  set  of  N  competing  models  the  one  with  the 
highest  posterior  probability  or,  if  losses  associated  with  choosing 
the  incorrect  model  are  known  or  can  be  estimated,  by  selecting  the 
model  that  minimizes  his  posterior  expected  loss.  The  use  of  posterior 
model  probabilities  for  model  selection  is  the  procedure  referred  to 
in  this  dissertation  as  Bayesian  Model  Selection  (BMS). 

The  decision-maker's  posterior  model  probabilities  indicate  that 
he  is  uncertain  of  the  form  of  the  random  process.  Thus,  any  decision 
procedure  based  on  a  chosen  model  fails  to  appropriately  treat  model 
specification  uncertainty.  Geisel  points  out  that  if  the  posterior 
probability  of  a  model  is  positive,  then  the  model  contributes  to  our 
knowledge  of  future  observations  of  the  random  process  of  interest  and 
there  is  no  theoretical  reason  to  neglect  this  contribution.  Hence, 
any  decision  procedure  that  involves  selecting  a  single  model  from 
among  a  set  of  competing  models  ignores  relevant  information,  and, 
computation  costs  and  other  complexities  aside,  can  only  be  viewed 
as  an  approximation  to  an  optimal  procedure. 

The  key  to  utilizing  all  the  information  contained  in  the  set 
of  competing  models  relative  to  future  observations  of  the  random 
process  lies  in  the  use  of  the  predictive  density  function  derived 
in  (2.13)  above  and  repeated  here  in  more  detail: 


See  A.  M.  Faden  and  G.  C.  Rausser,  "Econometric  Policy  Model 
Construction:  The  Post-Bayesian  Approach,"  Annals  of  Economic  and 
Social  Measurement,  5(1976),  349-362. 
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N 
f(y|D)  =  I   P'(M.)[/  f(y|e.,M.,D.)g'(e  |M  )de  ] 

1=1      1   0        "11       111 

(2.14) 
N 
=  I   P'(M.)f(y|M.,D.). 
1-1    '  '     ' 

f (y | D)  is  a  weighted  average  of  the  predictive  densities  of  y  for  each 

of  the  N  models  (referred  to  below  as  model  predictives) .  It  is  this 

distribution  that  the  decision  maker  should  use  to  characterize  the 

random  process  upon  which  his  decision  hinges  and  about  whose  form  he 

is  uncertain.  This  distribution  will  herein  be  referred  to  as  a 

Bayesian  Mixed  Model  Predictive  (BMMP)  distribution.  The  process  of 

computing  and  analyzing  posterior  probabilities  and  the  associated  BMMP 

distribtuion  is  called  "comparing  models"  by  Geisel  and  is  referred  to 

herein  as  the  Bayesian  Model  Comparison  (BMC)  procedure. 

Suppose  that  y  has  been  observed  and  that  the  decision  maker  is 

interested  in  making  a  decision  that  relates  to  some  future  value,  yF, 

of  the  random  variable.  If  the  decision  maker  knew  the  correct  model, 

say  M.,  and  its  parameters,  say  6.,  then  his  distribution  of  yF  would 

be  f  (yr- 1  e .  ,M.  ,DC. )  and  his  decision  would  depend  on  this  distribution. 
"F1  i  l  Fi 

Dp.  is  used  to  denote  values  of  the  decision  variables  of  model  i 
associated  with  yF.  But  the  decision  maker  knows  neither  the  correct 
model  nor  its  parameters.  What  he  does  know  is  summarized  in 
P"(M.|y,D)  and  f (yF|M. ,y,D. ).  Thus,  his  distribution  of  yF  should  be 
a  BMMP  conditioned  on  the  data  already  observed,  y  and  D: 
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N 
f(YF|y,D,DF)  =  I   P"(M.|y,D)[/f  (y,-|  8.  .M.D,..  )g(e.  |M.  ,y,D.  )de.]   (2.15) 
i=l         9 

N 

=  I   P"(M.|y,D)f(yF|M.,y,D.,DF.). 

This  BMMP  is  a  function  of  all  N  competing  models  and  thus  enables  the 
decision  maker  to  choose  a  course  of  action  in  light  of  all  available 
information  relating  to  yF. 

Even  when  the  BMMP  is  the  distribution  (model)  that  the  decision 
maker  should  use  to  characterize  the  random  process  in  question,  there 
can  be  at  least  three  reasons  for  selecting  a  single  model  via  the 
Bayesian  Model  Selection  procedure: 

1)  In  comparing  alternative  theories  or  hypotheses 
it  may  be  desirable  to  choose  the  one  with  the 
most  substantive  content. 

2)  It  may  be  more  convenient  to  approximate  the 
random  process  with  a  simple  model. 

3)  The  use  of  a  BMMP  may  prove  too  costly.  In 
general,  the  computation  of  a  BMMP  involves 
the  combination  of  its  components  via  exten- 
sive numerical  methods. 

Geisel  shows  that  under  certain  assumptions,  Bayesian  Model 
Selection  provides  a  Bayesian  interpretation  for  the  classical  pro- 
cedure of  choosing  from  a  set  of  models  the  one  with  the  lowest 
estimated  residual  variance,  s  ,  or  highest  coefficient  of  determi- 
nation,  R  .  Given  a  set  of  normal  regression  models  each  of  which 
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has  the  same  number  of  parameters,  given  diffuse  prior  distributions 
over  the  models  and  the  parameters  of  the  models,   and  given  a  symmetric 
loss  function  with  respect  to  the  choice  of  an  incorrect  model,  Geisel 
shows  that  the  procedure  of  choosing  a  model  with  the  highest  posterior 

model   probability,  P"(M. |y,D),   is  equivalent  to  the  procedure  of 

2  2   1 

selecting  the  model   with  the  lowest  s     or  highest  R  .       This  result  is 

2 
very  similar  to  a  result  derived  by  Thornber.       Thornber,  however, 

uses  as  priors  on  the  parameters  of  the  models  those  suggested  by 
Jeffreys'    invariance  theory,3  whereas  Geisel 's  priors  on  the  parameters 
of  the  models  take  the  form  of  multinomial   and  inverted  gamma-2  dis- 
tributions.    These  results  will   be  discussed  in  more  detail    in 
Chapter  III. 

Another  important  Geisel    result  that  will   be  drawn  upon   is  his 
proof  that  given,   say,  M,    is  the  true  model    in  the  set  of  N  competing 
models,  as  sample  evidence  accumulates   (i.e.,  n-*»)  P"(M.j|y,DH   and 
the  BMMP  +  P(yF|M. ,y,D-)-       Thus,   if  the  decision  maker  could  wait 
long  enough,   the  data  he  would  observe  would  tell    him  with  near 
certainty  which  of  the  N  models  was  generating  the  data.     This  result 
will   be  discussed  in  more  detail    in  Chapter  III. 


1  Ibid. ,   pp.   24-37. 

2E.   H.   Thornber,   "Applications  of  Decision  Theory  to  Econometrics" 
(Ph.D.   dissertation,  University  of  Chicago,   1966),  Chapter  2. 

3For  discussion  of  Jeffrey's  invariance  theory  see  Arnold  Zellner, 
An   Introduction  to  Bayesian   Inference  in  Econometrics   (New  York: 
Wiley,   1971),   pp.   41-53. 

4Geisel,  p.   23. 
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In  the  next  chapter  some  of  the  consequences  of  forecasting  with 
and  without  the  use  of  the  Bayesian  Model  Comparison  procedure  are 
explored.  Particular  attention  is  paid  to  the  comparison  of  the 
Bayesian  Model  Comparison  procedure  and  the  Bayesian  Model  Selection 
procedure. 


CHAPTER  III 

FORECASTING  WITH  AND  WITHOUT  REGARD 
FOR  MODEL  SPECIFICATION  UNCERTAINTY 


If  a  decision  maker  is  uncertain  as  to  which  one  of  N  random 

processes  is  generating  future  values  of  a  random  variable  upon  which 

the  effectiveness  of  his  current  decision  depends,  Geisel  contends 

that  the  decision  maker  should  use  the  Bayesian  Mixed  Model  Predictive 

(BMMP)  distribution  of  the  Bayesian  Model  Comparison  (BMC)  procedure 

to  reflect  the  information  he  has  concerning  the  process  of  interest. 

His  justification  for  this  approach  rests  primarily  on  the  following 

statement: 

Note  again  that  this  procedure  does  not  select  one 
model  as  "true"  or  "best"  and  eliminate  the  rest. 
If  the  probabilistic  weight  of  a  model  is  positive 
it  contributes  to  our  knowledge  of  the  future  ob- 
servations and  there  is  no  reason  to  neglect  this 
contribution.  Thus,  any  decision  theoretic  pro- 
cedure which  is  designed  to  eliminate  some  of  the 
models  is  viewed  as  an  approximation  which  is  used 
for  reasons  of  simplicity  of  view  or  to  reduce  the 
cost  of  computation. 

This  chapter  explores  some  of  the  consequences  of  forecasting  with  and 
without  the  use  of  the  Bayesian  Model  Comparison  procedure  and,  in  so 
doing,  attempts  to  more  rigorously  justify  advocation  of  the  BMC  pro- 
cedure for  use  in  decision-making  problems  in  which  model  specifi- 
cation uncertainty  is  present.  The  chapter  attempts  to  explain  why 


]Geisel ,  Chapter  II. 
2Ibid.,  p.  19. 
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it  would  frequently  be  worth  the  extra  cost  to  use  the  BMC  approach 
rather  than  approaches  which,  though  perhaps  simpler  and  less  costly, 
fail  to  fully  reflect  model  specification  uncertainty  and  the  totality 
of  information  the  decision  maker  has  concerning  the  process  of 
interest. 

In  this  chapter,  forecasting  via  the  Bayesian  Model  Comparison 

procedure  will  be  compared  to  forecasting  via  the  Bayesian  Model 

2 
Selection  procedure  and  the  maximize-R  rule.   It  is  shown  that  when 

model  specification  uncertainty  exists,  of  these  three  procedures 
only  the  BMC  procedure  optimally  handles  the  information  the  decision 
maker  has  concerning  the  data-generating  process  whose  future  values 
he  wants  to  forecast.  More  specifically,  if  a  decision  maker  forecasts 
via  the  BMS  procedure,  it  is  shown  that  the  risk  he  takes  in  predicting 
future  values  of  the  random  process  of  interest  is  misspecified.  It  is 
also  shown  that  the  decision-maker's  posterior  expected  loss  from  using 
a  BMC  forecast  is  less  than  his  posterior  expected  loss  from  using  a 
BMS  forecast.  The  last  two  sections  of  this  chapter  compare  the  effec- 
tiveness of  point  and  interval  forecasts  generated  via  the  BMC  procedure 
with  those  generated  via  the  BMS  procedure.   It  is  shown  that  BMS  point 
estimates  are  typically  misplaced  and  that  the  reliability  of  BMS 
credible  intervals  may  be  misspecified. 

The  following  section  introduces  notation  which  will  be  used  in  the 
remainder  of  the  chapter  and  examines  the  relationship  between  the  pre- 
dictive variance  of  y  as  defined  by  a  BMMP  distribution,  and  the  predic- 
tive variance  of  y  as  defined  by  the  model  selected  by  the  BMS  procedure. 
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III.1  A  Comparison  of  the  Predictive  Variances  Generated  by  the 
Bayesian  Mixed  Model  Distribution  and  the  Bayesian  Model 
Selection  Procedure 

Much  of  the  analysis  in  Section  III. 2  draws  on  the  relative  sizes 
of  the  predictive  variance  of  y  as  defined  by  a  Bayesian  Mixed  Model 
Predictive  distribution,  V(BMMP),  and  the  predictive  variance  of  y  as 
defined  by  a  Bayesian  Model  Selection  Predictive  distribution,  V(BMSP). 
Accordingly,  to  avoid  awkward  digressions  in  Section  III. 2,  this  section 
will  be  devoted  to  a  comparison  of  V(BMMP)  and  V(BMSP). 

It  was  shown  in  equations  (2.14)  and  (2.15)  that  the  BMMP  is  a 
weighted  average  of  the  predictive  densities  of  y  for  each  of  N  alter- 
native models.  Equation  (2.15)  is  repeated  here: 

N 

f(yF|y,D,DF)  =  I   P"(M  |y,D)[/f(yF|9i,M.,DFi)g"(ei|Mi,y,Di)  de.] 
i=l        q 

N 
=  J   P"(Mi|y,D)f(yF|Mi,y,Di,DFi).  (3.1) 

The  function  f  (yJM.  ,y,D.  ,Dp. )  will  be  referred  to  as  a  "model  predic- 
tive." Recalling  equation  (2.11),  a  model  predictive  is  a  distribution 
of  realizations  from  the  data-generating  process  conditioned  on 
1)  H.  being  the  correct  model  of  the  process;  2)  previous  observations 
of  y,  the  dependent  variable  of  interest,  and  D. ,  the  decision  vari- 
able; and  3)  Dp.,  the  value  of  the  decision  variable  with  which  the 
next  y  to  be  observed,  yF,  is  associated.  Thus,  if  the  Bayesian  Model 
Selection  procedure  chooses,  say,  M.,  it  is  M.'s  predictive  distri- 
bution, f(yr|M.,y,D. ,Dr. ),  that  is  being  chosen  to  characterize  future 

observations  of  y,  yF.  It  is  the  variance  of  this  preditive 
^(BMMP)  and  V(BMSP)  are  formally  defined  below. 
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distribution  that  is  referred  to  as  V(BMSP).  In  general,  the  mean  and 

variance  of  the  predictive  distribution  generated  by  M.  will  be 

2 
denoted  by  y.  and  a-,   respectively.  The  mean  and  variance  of  a  BMMP 

2 
will  be  denoted  by  y  and  a     (or  V(BMMP)),  respectively. 

It  is  shown  below  that 


v  =   P"(M1|yfD)Pl  +  ...  +  P"(MM|y,D)yM  (3.2) 


and 


a2  =  P"(M1|y,D)[a2  +  (M]  -y)2]  +  ... 

+  P"(MM|y,D)[a2  +  (vN  -p)2].  (3.3) 

To  demonstrate,  first  note  that  y  can  be  obtained  by  definition  as 

v  =  f yFf(yF|y.D,DF)dyF.  (3.4) 

Substituting  (3.1)  for  f(yF|y,D,Dp)  in  (3.4)  yields 

N 

v  =  j   yFC  I   P"(Mi|y,D)f(yF|Mi,y,Di,DFi)]dyF.        (3.5) 
i=l 

With  the  expansion  of  the  sum  in  equation  (3.5),  (3.2)  is  obtained: 

CO 

u  =  /  yFCP"(M-,|y,D)f(yF|M1,y,D1,DF1) 


+  P"(MN|y,D)f(yF|MN,y,DN,DFN)]dyf 


P^M^y.D)  /  yFf(yF|M1,y,D1,DF1)dyF 

...  +  P"(MN|y,D)  /  yFf(yF|MN,y,DN,DFN)dyF 

P"(M1|y,D)y1  +  ...  +  P"(Mjy,D)yw.  (3.6) 
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To  obtain  an  expression  for  the  predictive  variance  of  the  BMMP, 
note  that  by  definition 


V(BMMP)  =  a2  =   j    (yF  -  u)2f(yF|y,D,DF)dyF. 


(3.7) 


Substituting  (3.1)  for  f(yp|y,D,DF)  in  equation  (3.7)  yields 

7      -        ?   N 
a  =  /  (yF  -  y).I  P"(Mi|y,Di)f(yF]Mi,y,Di,DFi)dyF.   (3.8) 

The  following  is  obtained  by  expanding  the  sum  in  equation  (3.8): 

2    N  2 

a  =  I   P"(M .|y,D)  /  (y  -  y^f (y  |M. ,y ,D.  ,Dp. )dyp 

i  =  1 

N  » 

=  J  P"(M.|y,D)  /  (y|  -  2yFp  +  /)f(yF|M.  ,y,D.  ,DF.)dyr 

Working  with  the  ith  term  of  this  sum,   the  following  is  obtained: 


P"(Mi|y,D){Ei(yF:)    -   2MEi(yfr)   +  u2} 


:3.9) 


Noting  that  E^y2)  =  a2  +  [E.(yF)]2  and  E..-(yF)  =  u,  »  (3.9)  becomes 


P'^M^y.Djio2  +  u?  -  2y.y  +  u2} 


(3.io; 


The  three  right-hand  terms  inside  the  brackets  of  (3.10)  may  be 
factored  yielding: 


P"(Mi|y,D){o2  +  (p.  -  y)2} 


Thus,  a  may  be  written  as  follows: 


N 
a2  =  I    P"(M  |y,D){a2  +  (y.  -  p)2}, 

i=l    1      7     n 


3.11) 
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This  is  the  same  as  equation  (3.3).  Defining  P"(M, )  =  P"(M.|y,D), 
(3.11)  can  be  rewritten  as  follows: 

2    N       ?    N  „ 

«     =     I   P"(fU°i  +  I   P'^MJ^i  "  u).  (3-12) 

Having  defined  V(BMMP)  and  V(BMSP),  it  is  now  possible  to  compare 
their  magnitudes.  Assuming,  as  will  be  done  for  the  remainder  of  this 
dissertation  unless  otherwise  noted,  that  the  decision-maker's  model 
space  contains  only  two  models,  M,  and  M  '  the  relative  magnitudes  of 

V(BMMP)  and  V(BMSP)  will  be  examined  for  each  of  the  following  cases: 

2  _  2 

fl  =  a2- 

CASE  II:  a2  <  o\   and  BMS  chooses  M, . 

2    2 
CASE  III:  a,   <  o0   and  BMS  chooses  M0. 

For  convenience,  P"(M.)  will  be  used  in  place  of  P"(M.|y,D)  in  the 
discussion  and  proofs  of  these  cases  and  the  lemmas  that  follow. 

THEOREM  1  :   If  ^   =  o\,   then  V(BMMP)  >  V(BflSP). 

PROOF:   When  N  =  2, 

a2  =  P,,(M1)a*  +  P"(M2)o2  +  P"(M])(y1  -  u)2  +  P"(M2)(u2  -  u)2, 


and  when  02  -  a\,   V(BMSP)  =  a2  =  a2. 


This  assumption  is  made  in  order  to  simplify  the  analysis  which 
follows.  For  a  more  precise  explanation  of  this  assumption,  see 
Section  III. 2. 4. 
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Since  by  definition  0  1  P"(M, ),  P"(M2)  <  1,  it  follows  trivially  that 


2    2 

when  a,  =  a?, 


Thus,  if 


P"^)^  +  P"(M2)o2  =  o2  =  o|. 


P"(M1)(m1  -  m)2  +  P"(M2)(y2  -  p)2  >  0, 

2  2 

V(BMMP)  >  V(BMSP).  Since  (y,  -  u)  and  (u2  -  y)  are  nonnegative, 

P"(M1)(y1  -  u)2  +  P"(M2)(y2  -  y)2  ^  0 

and  V(BMMP)  .  V(BMSP).  Unless  y,  =  y2,  in  which  case  y1  =  u2  =  y, 

P"(M,)(y,  -  y)2  +  P"(M2)(y2  -  y)  is  strictly  greater  than  zero  and 
V(BMMP)  is  strictly  greater  than  V(BMSP). 


THEOREM 


_2:   If  a2  <  a2  and  BMS  chooses  M] ,  then  V(BMMP)  >  V(BMSP] 


2    2 
PROOF:   Refer  to  the  proof  of  Theorem  1.  Since  o-j  <  a2, 

P'^M^o2  +  P"(M2)o2  >  a2. 
From  the  proof  of  Case  I, 

P"(M1)(y1  -  m)2  +  P"(M2)(y2  -  y)2  >  0. 
Thus,  it  follows  that 

P'^M^a2  +  P"(M2)a2  +  P"(M1)(y1  -  u)2  +  P"(M2)(y2  -  y)2  -  o\.. 


This  dissertation  is  not  concerned  with  special  cases  in  which 


y,  =  u,,  and  a?  =  Op. 


32 


i.e.,  V(BMMP)  >  V(BMSP).  However,  V(BMMP)  equals  V(BMSP)  only  if 
P"(M.|)  =  1.  But,  if  P"(M,)  =  1,  there  exists  no  model  specification 
uncertainty.  Thus,  when  model  specification  uncertainty  exists, 
V(BMMP)  is  strictly  greater  than  V(BMSP). 

THEOREM  3:  If  c2  <   a2  and  BMS  chooses  M2>  then  V(BMMP)  <  V(BMSP). 
PROOF:  Refer  to  Theorem  1.  Whenever  P"(M2)  f   1, 
P"^)^  +  P"(M2)a2  <  a2. 

Therefore, 

a2  =  P"(M.,)a2  +  P"(M2)o2  +  P"^)^  -  u)2  +  P"(M2)(y2  -  u)2  <  a2 

depending  on  the  size  of  P"(M,)(y,  -  y)  +  P"(M2)(y2  -  u)  . 

Perhaps  the  most  important  thing  that  Theorems  1,  2,  and  3  reveal 
is  that  if  model  specification  uncertainty  exists,  V(BMMP)  f   V(BMSP), 
except  for  uninteresting  cases.  This  fact  will  be  referred  to  repeatedly 
throughout  Chapters  III  and  IV.  As  will  be  seen  in  Section  III. 2, 
the  inability  to  order  V(BMMP)  and  V(BMSP)  in  Case  III  poses  no  problem 
with  respect  to  comparing  the  relative  merits  of  the  BMC  and  BMS  proce- 
dures as  aids  to  forecasting.  It  does,  however,  make  identification 
of  whether  the  measure  of  forecast-risk  provided  the  decision  maker  by 
the  BMS  procedure  (defined  in  Section  III. 2. 6  to  be  V(BMSP))  understates 
or  overstates  the  actual  forecast-risk  faced  by  the  decision  maker. 
This  problem  is  discussed  in  Section  III. 2. 6.   In  Sections  III. 2. 6  and 
IV. 3.2,  it  is  shown  that  Case  III  may  never  arise,  since  situations 
exist  in  which  only  Case  I  applies. 

The  following  three  lemmas  and  the  discussion  that  follows  them 
are  useful  for  helping  to  order  V(BMMP)  and  V(BMSP)  in  situations 
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in  which  Case  III  applies.  The  first  provides  a  necessary  and 
sufficient  condition  for  V(BMMP)  to  be  greater  than  V(BMSP). 

LEMMA  1  :   If  a2  <   cr|s  ^  f   u2 ,   and  BMS  chooses  M2>  then 

V(BMMP)  >  V(BMSP)  if  and  only  if 

(Co  ~  °-|  ) 

P"(M2|y,D)  >  — l—^   . 

(y-i  -  u2) 

PROOF:  1.   If  V(BMMP)  >  V(BMSP),  it  must  be  shown  that 

(o9  -  a, J 
P"(M,) 


2'    '  ~ T  • 

lu-]  -  m2' 

Since  V(BMMP)  =  a2  =  P^M^a2  +  P"(H2)a2  +  P^M^U-,  -  u)2 
+  P"(M2)(y2  -  p)2 

and  V(BMSP)  =  a2  V(BMMP)  >  V(BMSP)  is  the  same  as 

P-tM^o2  +  P"(M2)a2  +  P"(M1)(u1  -  u)2  +  P"(M2)(u2  -  u)2  >  o\.      (3.13) 

Subtracting  P"(M-,)a?  +  P"(M2)o2  from  both  sides  of  (3.13)  yields 

P"(M1)(y1  -  y)2  +  P"(M2)(y2  -  y)2  >  o\   -    [P"(M1)a2  +  P"(M2)a2].  (3.14) 

From  (3.2)   it  is  known  that  y  =  P"(M1)p1   +  P"(M2)u2.     Let  the  rhs  of 
(3.14)  equal   R,  and  define  P]    =  P"^)  and  P2  =   P"(M2).     Then  substi- 
tuting for  m  in   (3.14)  yields 

P](»]    -   PlWl    -   P2p2)2  +  P2(u2   -   P^t    "   P2p2)2   >   R.  (3-15) 

Noting  that  P2  =  1    -  P, ,    (3.15)   can  be  written 

P1(y1P2   -  m2P2)2  +  P2(u2P1    -  m^)2   >   R-  (3-16) 
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Factoring  ?2  out  of  the  first  term  on  the  lhs  of  (3.16)  and  P-|   out 
of  the  second  term  on  the  lhs  yields 

P2PlP2(yl    '  p2)2  +  PlPlP2(p2  "  yl)2  >  R-  (3J7) 

2  2 

Noting  that  P,   +  P2  =  1 ,  and  that  P-,P2(y-,   -  u2)     =  Pi P2^u2  "  yl '    ' 

(3.17)   becomes 

P1P2(y1    -   u2)2   >  o\   -    (P^   +   ?2a\)    =   ?^{o\   -  a2).  (3.18) 

2         2 
Dividing  both  sides  of  this   inequality  by  P-,  (a2  -  a-j)  yields  the 

desired  result  2         2 

(a?  "  °-i) 

Po    > 


2       t  \2 

Ui   -  vol 


Jl   "  p2' 


f  2        2\ 
(a2  -  a-!  ) 


2.      If  P2  >  — — 2,   then  V(BMMP)   >  V(BMSP),   i.e., 

(y-|   -  m2^ 

Plal   +  P2°2  +  Pl^yl   "  ^     +  P2^p2  "  y^     >  °2' 

A  reversal   of  the  steps  in  the  first  half  of  the  proof  leads 
immediately  to  this  result. 

Lemma  1   can  be  combined  with  Theorem  1    to  forrn  a  necessary  and 

sufficient  condition   for  V(BMMP)   to  be  greater  than  V(BMSP)  when, 

2  2 

say,  cu    <  a?,   regardless  of  which  model   BMS  selects. 

LEMMA  2:      If  a2   <  o2,   the  V(BMMP)   >   V(BMSP)    if  and  only  if 
a)   Model    1    is  selected  by  BMS, 


or 


b)   Model   2   is  selected  by  BMS,  y1   /  y2,  and 
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[a9   -   a,  ) 
P"(M2|Y,D)  >  — l-?  ■ 

(u-i  -  M2^ 

PROOF:  Lemma  2  results  from  combining  Theorem  1  and  Lemma  1,  and 
its  proof  follows  directly  from  their  proofs. 

It  is  clear  that  V(BMMP)  >  V(BMSP)  whenever  condition  a  or  b 
of  Lemma  2  is  satisfied.  Thus,  upon  examining  condition  b  the 
following  can  be  said: 

1.  Other  things  equal,  the  greater  the  distance  between  the 
means,  y,  and  u?,  of  the  predictive  distributions  of  the 
two  models  in  question,  the  smaller  is  the  rhs  of  the 
inequality  of  condition  b,  and  the  more  likely  it  is 
that  V(BMMP)  >  V(BMSP). 

2.  Other  things  equal,  the  closer  in  size  are  the  predictive 

2     2 
variances,  a,  and  a?,  the  smaller  is  the  rhs  of  the 

inequality  of  condition  b,  and  the  more  likely  it  is 
that  condition  b  holds,  i.e.,  the  more  likely  it  is 
that  V(BMMP)  >  V(BMSP). 
Both  these  statements  apply  irregardless  of  which  model  is  chosen  by 
BMS,  i.e.,  whether  it  be  the  model  with  the  lower  or  higher  pre- 
dictive variance. 

As  an  example  of  how  statements  one  and  two  might  help  determine 
the  relationship  between  V(BMMP)  and  V(BMSP),  the  following  is 
offered.  Suppose  the  decision-maker's  prior  information  about  y 
leads  him  to  believe  that  the  predictive  variances  of  both  models 
are  roughly  equal,  but  that  their  predictive  means  differ 
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significantly.  By  Theorems  1  and  2  and  statements  one  and  two  above, 
the  decision  maker  should  consider  it  more  likely  that  V(BMMP)  exceeds 

V(BMSP)  than  if  he  believed,  say,  that  u-|  and  y2  were  about  the  same 

2  2 

size.  This  follows  since  a)  if  o,  in  fact  equals  o2>  then  by  Theorem  1 

2    2 
V(BMMP)  >  V(BMSP);  b)  if,  say,  a,  <  a2  and  the  BMS  procedure  chooses 

2   2 
M,,  then  Theorem  2  applies  and  V(BMMP)  >  V(BMSP);  and  c)  if  a-,  <  a2 

and  the  BMS  procedure  chooses  M2,  then  Theorem  3  applies  and  the  de- 

2   2 
cision-maker's  prior  information  about  a-,,  a2,  y,  and  y2  in  concert 

with  statements  one  and  two  above  indicate  that  it  is  more  likely  that 

V(BMMP)  exceeds  V(BMSP)  than  if,  say,  the  decision  maker  thought  y,  and 

\i„   were  about  the  same  size. 

The  next  section  utilizes  the  results  of  this  section  in  comparing 

2 
the  effectiveness  of  the  BMC,  BMS,  and  maximize-R  approaches  to 

forecasting. 

III. 2  Forecasting:  Bayesian  Model  Comparison  Versus 
Bayesian  Model  Selection  and  the  Maximize-R2  Rule 

Most  forecasting  procedures  handle  model  specification  uncertainty 
suboptimally.  Typically,  a  forecaster  proposes  a  number  of  alternative 
statistical  models  as  possible  candidates  to  represent  the  data- 
generating  process  whose  future  value  he  is  interested  in  predicting 
and  then,  via  some  model  screening  procedure,  eliminates  all  but  one 
model . 


For  a  discussion  of  various  classical  and  Bayesian  model  screening 
procedures,  see  Kenneth  M.  Gaver  and  Martin  S.  Geisel,  "Discriminating 
among  Alternative  Models:  Bayesian  and  Non-Bayesian  Methods,"  Chapter 
Two  in  Paul  Zarembka  (ed.),  Frontiers  in  Econometrics  (New  York: 
Academic  Press,  1974),  pp.  49-77. 


37 


In  this  section,   forecasting  as  accomplished  via  two  model- 
screening  procedures,  Bayesian  Model   Selection   (BMS)  and  the  classi- 

2  2 

cal   maximize-R     rule  approach   (max-R  ),   is  compared  to  forecasting  as 

handled  by  a  procedure  that  optimally  considers  model    specification 

uncertainty,  the  Bayesian  Model    Comparison  approach   (BMC).     Before 

2 
proceeding  with  the  comparison  a  brief  review  of  BMS,  max-R   ,  and 

BMC   is  in  order. 


III. 2.1  The  Bayesian  Model  Selection  Procedure  (BMS) 

Bayesian  Model  Selection  was  discussed  in  some  detail  in  Chapter 
II.  Briefly,  it  requires  the  following: 

1.  The  specification  of  a  set  of  N  alternative  statistical 
models  each  of  which  purports  to  represent  the  data- 
generating  process  of  interest  to  the  forecaster. 

2.  The  assessment  of  a  prior  probability  mass  function  over 
the  set  of  N  models,  P'(M-),  i=l,2,...,N. 

3.  The  assessment  of  prior  probability  density  functions  over 

the  parameters  of  each  model  ,  g '( e  .  |  M . ) ,  i  =  l,2,...,N. 

4.  The  specification  of  a  likelihood  function  for  each  model, 
fCyle^M-.D.),  i=l,2,...,N.1 

5.  The  computation  of  posterior  probabilities  for  the  models, 
P"(M.  jy,D),  i  =  l,2,. ..,N. 

The  posterior  model  probabilities  are  often  used  to  select  one  model 
from  among  the  set  of  N  models  to  represent  the  data-generating 


When  thought  of  as  a  function  of  y  with  e.,  M.,  and  D.  given, 
f(y|9r  Mi?  D.)  is  model  i.  1   ""      ^ 
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process  of  interest  to  the  forecaster.  The  usual  procedure  is  to 
select  the  model  with  the  highest  posterior  model  probability.  In 
the  event  that  the  forecaster  can  estimate  the  loss  that  results  from 
choosing  an  inappropriate  model  and  can  do  this  for  each  of  the  N 
models,  he  can  compute  his  expected  loss  from  choosing  each  model 
and  select  the  model  which  yields  the  lowest  expected  loss. 

It  should  be  noted  that  BMS  may  also  be  used  for  reasons  other 
than  for  the  selection  of  a  single  model  from  among  a  set  of  N 
models.  For  example,  if  N  is  large,  BMS  can  be  used  to  reduce  the 
number  of  models  in  the  model  space  to  a  number  that  can  be  more 
easily  and  inexpensively  dealt  with  by  a  procedure  such  as  BMC. 
This  can  be  accomplished  by  eliminating  all  models  from  considera- 
tion whose  posterior  model  probability  is,  say,  less  than  some  a, 
0  <  a  <  1.  In  this  dissertation,  however,  BMS  will  be  regarded  as 
a  procedure  for  selecting  a  single  model  from  among  N  alternative 
model s. 

The  forecaster  who  uses  BMS  essentially  handles  his  forecasting 
problem  in  a  two-step  sequence:  first,  a  single  model  is  chosen  to 
represent  the  data-generating  process;  second,  under  the  assumption 
that  the  chosen  model  is  in  fact  a  "true"  reflection  of  the  data- 
generating  process,  the  forecaster  addresses  his  prediction  problem. 


Actually  the  decision-maker  must  be  able  to  determine  the 
loss  from  choosing  model  i  when  model  j  is  the  true  model,  i  t   j. 
There  are  N(N  -  1)  such  losses. 


39 


1 1 1. 2. 2  The  Maximize-R2  Rule 

The  maximize-R  rule  is  frequently  used  to  choose  one  from  among 
a  set  of  alternative  competing  linear  statistical  models  whose  explana- 
tory variables  are  nonrandom.   The  usual  procedure  is  to  estimate  the 

parameters  of  each  of  the  alternative  models,  compute  each  model's 

2 

coefficient  of  determination,  R  ,  and  then  select  as  being  the  best 

representation  of  the  data-generating  process  the  model  with  the 

2 
highest  R  .  Forecasting  is  then  carried  out  utilizing  the  chosen  model 

as  if  it  were  in  fact  the  true  model. 

2 
It  is  important  to  reiterate  the  well-known  fact  that  R  is 

2 
inversely  related  to  S  ,  the  estimate  of  the  dependent  variable's 

2 
residual  variance.  A  maximize-R  rule  is  therefore  equivalent  to 

2  2 

a  minimize-S  rule.  In  other  words,  the  model  with  the  maximum  R 

2 
is  also  the  model  with  the  minimum  S  . 

2  3 

Geisel  and  Thornber  have  shown  that  under  certain  conditions 

2 
model  selection  as  accomplished  via  the  max-R  rule  is  equivalent  to 

the  Bayesian  Model  Selection  procedure.  The  conditions  are  the 

following: 

1.  The  loss  structure  with  respect  to  the  selection  of  an 

incorrect  model  is  symmetric.  That  is,  if  the  loss  from 


1  2 

For  a  more  detailed  discussion  of  the  max-R  rule  see  Gaver 

and  Geisel ,  pp.  52-53. 

2Geisel  ,  pp.  24-37. 

3 
Thornber,  Chapter  2. 
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choosing  M.  when  M.  is  true  is  represented  by  l_.-,  then 
3  i      j  r  J     ij 

L-.  =  L^,  for  all  i,j,k,£  =  1,2,...,N,  with  i  f   j  and 
k  f   1.  and  l.y   lu   >  0. 

2.  P'(M,)  =  P'(M2)  =  ...  =  P'(MN),  i.e.,  the  prior  model 
probabilities  are  equal. 

3.  The  statistical  models  in  question  are  normal  regression 
models  each  of  which  has  the  same  number  of  parameters. 

The  parameters  of  each  are  its  regression  coefficients, 

2 
usually  denoted  by  e's,  and  its  residual  variance,  a  . 

ei 
That  each  model  has  the  same  number  of  coefficients  implies 

that  each  model  has  the  same  number  of  independent  (explana- 
tory) variables. 

4.  The  prior  density  function  for  the  parameters,  S-  and  a     , 

ci 
is  diffuse. 

Geisel  and  Thornber  used  different  forms  for  the  diffuse  prior 

density  function  for  the  parameters  6-  and  a  ,  but  both  showed  that 

selection  of  the  model  with  the  highest  posterior  probability  is 

2 
equivalent  to  selection  of  the  model  with  the  lowest  S  .  Since  the 

2  2 

model  with  the  lowest  S  also  has  the  highest  R  ,  Geisel  and  Thornber 

have  shown  that  selection  of  a  model  via  the  BMS  procedure  is  equiva- 

2 

lent  to  selection  via  the  maximize-R     rule. 

2 
Since  a  model's   R     can  be  increased  simply  by  adding  more 

_2 
"explanatory"   variables  to  the  model,  a  maximize-R     rule  is  frequently 

2  -2  1 

used  in  place  of  the  maximize-R     rule.     R     is  defined  as  follows: 


See  Gaver  and  Geisel,  pp.   52-54. 
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-2  _  R2   (k  -  1)  n   R2^ 
R  -  R  -  (n  .  k)  (I  "  R  ) 

where  n  is  the  sample  size  and  k  is  the  number  of  explanatory  vari- 

-2 
ables.  The  addition  of  variables  will  increase  the  model's  R  , 

adjusted  coefficient  of  determination,  if  and  only  if  the  F  statistic 

for  the  hypothesis  that  the  added  variables'  coefficients  are  all  zero 

is  greater  than  one.   Geisel  showed  that  in  the  two-model  case,  model 

selection  via  the  BMS  procedure  can  be  made  equivalent  to  selection 

via  the  maximize-R2  rule  if  the  relationships  between  the  parameters 

of  M,  and  M?  are  appropriately  specified.  The  required  parameter 

relationships  are,  unfortunately,  somewhat  nonsensical.  There  are 

no  known  intuitively  meaningful  sets  of  assumptions  under  which  the 

"2  2 

BMS  procedure  and  the  maximize-R  rule  are  equivalent. 

In  the  remainder  of  this  chapter  the  four  conditions  listed 

above  apply,  unless  noted  otherwise.  Thus,  to  avoid  redundancy,  the 

maximize-R2  rule  will  not  be  discussed  directly  in  what  follows 

but  will  be  addressed  indirectly  through  comments  about  the  equiva- 

2 
lent  selection  procedure,  BMS.  Since  the  BMS  and  maximize-R  pro- 
cedures are  equivalent  only  in  that  they  select  the  same  model, 
only  comments  concerning  the  fact  that  the  BMS  procedure  actually 

chooses  a  model,  or  comments  about  which  model  it  chooses,  also  apply 

2 
to  the  maximize-R  procedure. 


]John  B.  Edwards,  "The  Relationship  Between  the  F-Test  and  R  , 
The  American  Statistician,  23  (December,  1969),  p.  28. 

Geisel,  pp.  41-45. 
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III.2.3  The  Bayesian  Model  Comparison  Procedure  (BMC) 

The  Bayesian  Model  Comparison  procedure  was  discussed  in  detail  in 
Chapter  II.  Briefly,  it  requires  the  following: 

1.  The  specification  of  a  set  of  N  alternative  statistical 
models,  each  of  which  purports  to  represent  the  data- 
generating  process  of  interest  to  the  forecaster. 

2.  The  assessment  of  a  prior  probability  mass  function  over 
the  set  of  N  models,  P'(M.),  1=1 N. 

3.  The  assessment  of  prior  probability  density  functions 
over  the  parameters  of  each  model,  g'(e.|M. ),  i=l,...,N. 

4.  The  specification  of  a  likelihood  function  for  each  model, 
f(y|ei,Mi,Di),  i=l,...,N. 

5.  The  computation  of  posterior  probabilities  for  the  models 
(referred  to  as  model  probabilities),  P"(M. |y,D) ,  i=l,2,...,N. 

6.  The  computation  of  the  marginal  distribution  of  future  values 
of  the  data-generating  process.   (This  distribution,  as 
noted  earlier,  is  a  predictive  distribution.  It  will  be  re- 
ferred to  herein  as  the  Bayesian  Mixed  Model  Predictive 
(BMMP).) 

The  first  five  requirements  are  the  same  as  the  five  requirements 
of  the  Bayesian  Model  Selection  procedure.  It  is  the  sixth  requirement 
that  distinguishes  the  Bayesian  Model  Comparison  procedure  from  the 
Bayesian  Model  Selection  procedure.  Instead  of  choosing  one  of  the  N 
models, as  does  the  BMS  procedure,  BMC  models  the  data-generating 
process  of  interest  with  the  BMMP  distribution. 
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Recalling  (2.14),  the  BMMP  distribution  is  defined  as  follows: 

N 
f(y|D)  =  T  P'(M.)[/  f(y|e1.,M.>D.)g,(e,|M.)de,]      (3.19) 
1=1      0 


=  I   P'(M.)f(y|M.,D.).  (3.20) 

1-1    '  '  1 

All  the  terms  denoted  in  (3.19)  and  (3.20)  were  defined  in  Chapter  II, 
and  the  distributions  denoted  in  (3.19)  and  (3.20)  were  redefined  in 
the  six  requirements  above. 

After  observing  realizations  of  the  data-generating  process  in 
question,  the  BMMP  takes  the  form  presented  in  (2.15): 

N 
f(yF|y,D,DF)  =  I   P"(M.|y,D)[J  f(yF|9i,M.,DF.)g"(ei|Mry,D.)dei]  (3.2V 
1=1         0 

N 
=  I   P"(M.|y,D)f(yF|M.,y,Di,DFi).  (3.22) 

1=1 

Recall  that  D  =  (D, ,D?, . . . ,DN) ' ,  where  D.  is  a  vector  containing  the 
values  of  model  i's  explanatory  variables  that  correspond  to  the  most 
recently  observed  y  value.  Dp  =  (D.-,  ,DF2,. . .  ,0™)' ,  where  DFi  is  a 
vector  containing  the  values  of  model  i's  explanatory  variables  at 
the  time  the  next  y  value  is  to  be  generated.  From  (3.20)  or  (3.22), 
it  can  be  seen  that  a  BMMP  distribution  is  a  weighted  average  --  or 
mixture  --  of  each  model's  predictive  density  of  yF,  f (yF|M- ,y,D. ,DF- ) . 
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The  implications  of  parameter  and  residual  uncertainty  for  pre- 
diction and  decision  making  have  been  given  considerable  attention. 
See,  for  example,  any  of  the  following:  Theil,  Fisher,  Brainard, 
Leland,  Basu,  Zellner,  Barry  and  Horowitz,  and/or  Waud.   As  noted 
in  Chapter  II,  the  BMC  procedure  considers  residual,  parameter,  and 
model  specification  uncertainty.  Accordingly,  if  each  model  in  the 
set  of  N  competing  models  is  viewed  as  a  possible  "parameter  value" 
for  the  process  of  interest,  the  BMC  procedure  may  be  thought  of  as  a 
means  for  extending  the  parametric  analysis  of  prediction  and  decision- 
making problems  to  include  consideration  of  the  possibly  widely  dif- 
fering predictive  and  decision-making  implications  of  the  competing 
models.  Thus,  just  as  a  Bayesian  can  extend  predictive  analysis  by 
explicitly  allowing  for  parameter  uncertainty  instead  of  just  using 
parameter  estimates,  the  BMC  procedure  extends  parametric  analysis 
by  explicitly  considering  model  specification  uncertainty. 

A  forecaster  using  the  BMC  procedure  rather  than,  say,  the  BMS 
procedure,  does  not  have  to  unnaturally  divide  the  forecasting  problem 
into  two  parts.  He  does  not  have  to  first  select  a  model  from  the  set 


H.  Theil,  Economic  Forecasts  and  Policy  (Amsterdam:  North- 
Holland,  1961).  Walter  D.  Fisher,  "Estimation  in  the  Linear  Decision 
Model,"  International  Economic  Review,  3^  (January,  1972):  1-29. 
William  Brainard,  "Uncertainty  and  the  Effectiveness  of  Policy," 
American  Economic  Review,  57  (May,  1967):  411-25.  H.  Leland,  "The 
Theory  of  the  Firm  Facing  Uncertain  Demand,"  American  Economic  Review, 
62_(1972):  278-291.  A.  Basu,  "Economic  Regulation  Under  Parameter 
Uncertainty"  (Ph.D.  dissertation,  Economics  Department,  Stanford  Uni- 
versity, 1973).  Zellner,  Chapters  II,  III,  and  XI.  Christopher  B. 
Barry  and  Ira  Horowitz,  "Risk  and  Economic  Policy  Decisions,"  Publ ic 
Finance,  30  (no.  2,  1975):  153-165.  Roger  Waud,  "Asymmetric  Policy- 
maker Utility  Functions  and  Optimal  Policy  Under  Uncertainty," 
Econometrica,  44  (January,  1976):  53-66. 
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of  N  competing  models  and  then,  assuming  the  chosen  model  to  be  the 
correct  model  of  the  process,  proceed  with  his  forecasting.  He 
computes  the  BMMP  distribution  for  his  set  of  models  and  uses  it 
directly  to  determine,  say,  point  or  interval  predictions  for  future 
values  of  y.  The  forecaster's  BMMP  distribution  reflects  his  residual, 
parameter,  and  model  specification  uncertainty,  and  any  predictions 
that  he  makes  using  his  BMMP  are  made  in  light  of  all  three  types  of 
uncertainty  and  with  the  use  of  information  bearing  on  any  and  all  of 
them.  This  point  will  be  discussed  in  greater  detail  in  Section 
III. 2. 5. 

The  next  section  sets  forth  the  specific  assumptions  under  which 
the  BMC  and  BMS  procedures  will  be  compared  in  the  remainder  of  the 
chapter. 

III. 2. 4  Model  Space  and  Assumptions 

The  comparison  of  the  BMC  and  BMS  procedures  (and  indirectly 

2 
max-R  )  that  follows  will  be  based  on  the  following  assumptions: 

1.  The  decision  maker  (forecaster)  behaves  as  if  he  believes 

that  one  or  the  other  of  the  following  two  models  is  an 

accurate  representation  of  the  random  process  of  interest, 

but  he  is  unsure  which  model  is  appropriate: 

M]  :  y  =  P^X  +  e; 

M2:     y  =   BoZ  +  6. 
y  is  the  variable  whose  future  value  the   forecaster  is 
interested  in  predicting.      X  and  Z  are  two  different 
explanatory  variables.     X  and  Z  are  random,  but  their 
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values  associated  with  the  next  y  to  be  generated  are  known 
prior  to  y's  observation.  B,  and  s?  are  unknown  parameters. 

e  and  5  are  the  usual  normally  distributed  error  terms, 

2     2 
each  with  mean  zero  and  unknown  variance,  a     and  a.,  res- 

£         0 

pectively.  It  is  also  assumed  that  cov(3-.,e)  =  cov(e2,6)  = 
cov(e,6)  =  0.  Thus,  M,  and  M?  are  normal  univariate  regres- 
sion models  which,  to  keep  the  number  of  each  model's  unknown 
parameters  at  two,  have  been  forced  through  the  origin.  Since 
the  values  of  the  explanatory  and  dependent  variables  can 
always  be  scaled  so  that  M,  and  M«  pass  through  the  origin, 
no  generality  is  lost  by  using  models  without  intercept 
terms.  Care  must  be  taken,  however,  to  interpret  results 
in  the  appropriate  units. 

2.  The  random  process  of  interest  to  the  forecaster  is 
stationary. 

3.  X  and  Z  are  uncorrelated  and  only  the  explanatory  variable 
in  the  true  model  affects  y.  Thus,  if  M,  were  the  true 
model,  e2  would  be  zero.   If  neither  M-|  nor  M2  were  the 
true  model,  it  may  be  that  S,  =  B2  =  0- 

4.  In  comparing  the  BHC  and  BMS  procedures,  it  will  be  assumed 
that  the  forecaster  may  have  prior  information  about  the 
parameters  of  M,  and  f-L.  Since  model  selection  via  the  BMS 
procedure  and  the  maximize-R  rule  are  equivalent  only  if 
the  forecaster  has  no  prior  information  about  the  parameters 
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of  the  models,  any  comments  made  about  the  BMS  procedure 

?  1 

under  this  assumption  do  not  apply  to  the  maximize-R  rule. 

Note  that  in  assumption  one  above  the  residual  variance  of  each 

model  is  assumed  to  be  unknown.  It  would  be  unrealistic  to  assume 

the  residual  variance  to  be  known  when  the  correct  model  of  the  process 

2     2 
is  not  known.  Further  if  a  and  ar  were  known,  or  were  assumed  to  be 

£         0 

known,  and  the  correct  model  was  known  to  be  either  M,  or  M?,  the 
correct  model  could  be  selected  by  the  forecaster  with  probability  one 
and  there  would  be  no  need  for  procedures  such  as  BMC  or  BMS. 

To  illustrate,  consider  the  following  argument.  For  a  given  X 

2       2 
value  the  conditional  variance  of  y,  cr  ■  ,  is  a  .  For  a  given  value 

y  j  x     £ 

2       2 
of  Z  the  conditional  variance  of  y,  a  ,7,  is  a..  The  marginal  variance 

y  \L  6 

o 

of  y  (i.e.,  y's  variance  unconditioned  on  X),  a   ,  as  described  by  M, 

2  2         2 
is  B^ax  +  o£  and  the  marginal    variance  of  y  as  described  by  M?   is 

2  2         2 

e2aZ  +  a6*      If  Ml   were  1"n   fact  the  true  model '   then 

2  .  Q2  2   .     2 

ay  "  Vx  +  V 

2  2 

ay|x  =  V 

and 

e2  =  o . 

Since  32  =  °>   the  marginal    variance  of  y  as  described  by  M~   is  simply 

2 
a&.     Thus,   since  the  marginal    variance  of  y  is  now  known  to  be 

2   2  2  2  2   2? 

6l°x  +  °e'    U    follows   that  CT6   =   61°X  +  CT£-      This   says   that  wnen   Mt    is 

2         2  ?  ? 

the  true  model   a     <  a   .     Consequently,   if  it   is  assumed  that  a     and  a; 

E  °  £  0 


1  ? 

The  specific  conditions  under  which  the  BMS  and  the  max-R 
approaches  to  model    selection  are  equivalent  were  listed  in  Section 
III. 2.     Only  assumption   four  of  this  section  affects  their  equivalency. 
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are  known,  the  model  with  the  lower  residual  variance  can  be  identi- 
fied with  probability  one  as  being  the  true  model. 

In  the  next  section  the  BMS  and  BMC  procedures  are  compared  with 
respect  to  how  well  each  accounts  for  a  forecaster's  model  specifi- 
cation uncertainty. 

HI. 2. 5  The  Treatment  of  Model  Specification  Uncertainty 

Assuming  that  the  random  process  of  interest  is  stationary  and 
that  one  of  a  proposed  set  of  alternative  models  is  a  true  repre- 
sentation of  the  process,  Geisel  has  shown  that  in  the  limit  the  BMMP 
and  BMS  predictive  distributions  are  the  same.   Thus,  in  the  limit, 
the  BMC  and  BMS  approaches  to  forecasting  are  equivalent.  This 
result  is  demonstrated  below. 

Recalling  (2.15),  a  BMMP  can  be  written  as  a  weighted  average  of 
model  predictives: 

N 

f(YF|y,D,DF)  =  I   P"(Mi|y,D)f(yF]Mi,y,Di,DFi).      (3.23) 

Each  of  the  individual  model  predictives,  f (yF |M - ,y,D. ,DF- ) ,  is  the 
distribution  that  would  be  used  to  characterize  the  random  process 
in  question  if  the  BMS  procedure  chose  M.. 

Geisel  has  shown  that  if  M.  is  in  fact  the  true  model,  then  as 
sample  evidence  accumulates  (i.e.,  as  n  ->   °°)  P"(M.  |y,D)  approaches  one. 
It  follows  trivially  that  as  n  approaches  <*>,  f(yF|y,D,DF)  approaches 


Geisel,  pp.  22-23. 
2Ibid. 
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ffv  IM  v  D.,D^.).  Thus,  since  the  distribution  yielded  by  the  BMC 
VJT  i    1   Fi 

procedure  to  forecast  future  values  of  y  is  f (yp|y,D,DF) ,  and  that 
yielded  by  the  BMS  procedure  for  forecasting  purposes  is 
ffv  IM  .y.D..Dr-)i  in  the  limit  the  BMC  and  BMS  procedures  are 
equivalent  forecasting  procedures.  This  unsurprising  result  says 
that  in  the  limit,  under  the  assumed  conditions,  truth  is  obtained, 
i.e.,  the  accumulated  data  would  indicate  with  certainty  the  model 
that  had  been  generating  the  data.   If  such  were  the  case,  everybody 
would  ultimately  use  the  same--correct--model  to  predict  future 

values  of  y. 

In  both  the  BMS  and  BMC  procedures  the  forecaster  or  decision 
maker  proposes  a  set  of  N  models  each  of  which  he  believes  might 
correctly  represent  the  random  process  whose  future  values  he  is 
interested  in  predicting.  Theoretically,  if  he  assesses  a  nonzero 
probability  for  a  particular  mdoel ,  that  model  should  be  included 
in  his  model  space.  In  both  the  BMS  and  BMC  procedure  the  fore- 
caster assesses  a  prior  probability  mass  function  over  the  N  models 
in  his  model  space.  By  so  doing  the  forecaster  is  formally 
acknowledging  the  fact  that  he  is  uncertain  as  to  the  correct  model. 
He  is  thus  faced  with  a  forecasting  problem  in  which  model  specifi- 
cation uncertainty  is  present  and  must  be  dealt  with. 
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By  selecting  one  of  the  N  models  and  assuming  it  to  be  true,  the 
BMS  approach  to  forecasting  yields  predictions  that  do  not  appropri- 
ately reflect  the  forecaster's  model  specification  uncertainty.  The 
BMMP  of  the  BMC  procedure,  however,  by  utilizing  all  N  model  predic- 
tives  and  their  associated  model  probabilities  acknowledges  the 
forecaster's  model  specification  uncertainty  and  yields  predictions 
that  do  reflect  this  uncertainty.  Forecasting  via  the  BMS  procedure 
should  therefore  be  regarded  as  an  approximation  to  the  "optimal" 
approach  to  forecasting  offered  by  the  BMC  procedure. 

In  the  next  section  of  this  chapter  the  risk  involved  in  fore- 
casting via  the  BMC  procedure  is  compared  to  that  involved  in  fore- 
casting via  the  BMS  procedure.  These  risks  are  measured  by  V(BMMP) 
and  V(BMSP),  respectively. 

III. 2. 6  Risk  Specification 

Forecasts  are  frequently  used  an  inputs  to  decision-making 
problems.  For  example,  predicted  new-car  demand  might  be  used  by  an 
auto  manufacturer  in  determining  the  rate  and  timing  of  automobile 
production,  as  well  as  the  size  of  his  labor  force.  Much  of  the  risk 
taken  by  a  decision  maker  in  making  a  decision  that  utilizes  a  fore- 
cast stems  from  the  possibility  of  forecasting  error.   If,  for 
example,  the  forecasted  new-car  demand  errs  on  the  high  side,  both  the 
manufacturer  and  many  of  his  distributors  might  be  burdened  with  an 
excess  stock  of  cars,  leading  to  unnecessarily  high  inventory  costs. 
The  risk  passed  on  to  a  decision  maker  by  a  forecaster,  called  here 
forecast-risk,  will  be  assumed  to  be  adequately  measured  in  terms  of 
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the  variance  of  the  forecaster's  predictive  distribution.  Such  an 
assumption  would  be  appropriate,  for  example,  if  losses  associated 
with  forecast  errors  were  proportional  to  the  squared  error  of  the 
forecast. 

A  forecaster  that  utilizes  the  BMS  or  BMC  procedure  is  admitting 
that  he  is  uncertain  of  the  specification  of  the  process  whose  future 
values  he  wishes  to  predict.  It  has  been  noted  above  that  this  uncer- 
tainty is  fully  reflected  in  a  BMMP  distribution  but  not  in  a 
Bayesian  Model  Selection  Predictive  (BMSP)  distribution.  Thus, 
unless  V(BMMP)  equals  V(BMSP),  or  if  no  model  specification  uncertainty 
exists,  V(BMSP)  is  an  inappropriate  measure  of  forecast-risk,  either 
under  or  overstating  it  as  V(BMMP)  >  V(BMSP)  or  V(BMMP)  <  V(BMSP). 
Thus,  the  decisions  that  utilize  a  prediction  arrived  at  via  the  BMS 
procedure  will  have  been  made  under  the  assumption  that  the  risk 
involved  is  either  less  than  or  greater  than  it  is  in  reality.  The  BMS 
procedure,  therefore,  has  the  potential  to  provide  the  decision  maker 
with  information  that  may  lead  him  to  generate  inappropriate  and 
excessively  costly  decisions. 

As  seen  in  Cases  I,  II,  and  III  of  Section  III.l,  V(BMMP)  may  be 
greater  than  or  less  than  V(BMSP).  In  certain  situations  it  is  more 
likely  that  V(BMMP)  is  greater  than  V(BMSP),  and  in  others  it  is 
always  the  case  that  V(BMMP)  is  greater  than  V(BMSP).  Such  situations 
will  be  discussed  below. 

It  was  noted  in  Section  III. 2. 2  that  a  model's  posterior  proba- 

2 
bility  is  inversely  related  to  its  estimated  residual  variance,  S-, 
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and,   therefore,   directly  related  to   its  coefficient  of  determination, 

2 
R  .     Thus,   if  M, 's  posterior  probability  is  high  relative  to  M~'s  pos- 

2  2  2 

tenor  probability,   then  S,    is  low  relative  to  S2,  and  R,    is  high  rela- 

2 
tive  to  R?.      If  such  were  the  case,   it  could  be  said  that  the  accumu- 
lated evidence  supports  M,    rather  than  NL  as  being  the  more  likely 

data-generating  source.     Accordingly,  a  forecaster  might  be  tempted  to 

2 
invoke  the  BMS  procedure  or  the  maximize-R     rule  and  choose  M,   and  its 

predictive  distribution  with  which  to  forecast  yF.  But  in  such  cases 
it  is  more  likely  that  V(BMMP)  >  V(BMSP)  than  it  would  be  if  the  evi- 
dence did  not  so  clearly  support  one  model   or  the  other.       This   is 


explained  below, 
i    2 


CLAIM:   If 


(u2  "  yi  ) 


2     remains  constant,  the  larger  the  difference  in 


P"(M-, )  and  P"(M2),  the  more  likely  that  V(BMMP)  >  V(BMSP). 

DISCUSSION:  Zellner  has  shown  that  for  a  normal  regression  model 

(see  the  assumptions  of  Section  III. 2. 4)  with  diffuse  prior  information 

2 
on  the  parameters  of  the  model,  V(BMSP),  also  denoted  a.,  is  defined  as 


follows 


2   (n  -  1)S^ 


'i  "  (n  -  3) 


°2n 


+  1 


D2. 
j  =  l  J1 


!3.24) 


From  (3.12)  it  can  be  seen  that  when,  say,  P"(M.)  is  close  to 

one,  the  difference  between  V(BMMP)  and  V(BMSP)  is  of  no  practical 
significance.  Under  such  circumstances  a  comparison  of  V(BMMP) 
and  V(BMSP)  serves  little  purpose. 


Zellner,  pp.  72-74. 


53 


where  n   is  the  sample  size,   i.e.,   the  number  of  y  values  observed  to 
date;  the  D-.'s  are  the  values  of  model    i's  independent   (explanatory) 

variable,   D.,  observed  to  date;   Dpi   is  the  value  of  Di   that  corresponds 

2 
to  the  next  y  value  generated  by  the  process   in  question;   Si    is  the 

estimated  residual    variance  of  model    i.      It  can  be  seen  from  (3.24) 

2  2 

that  a-   is  proportional   to  S^ . 

It  is  known   from  Geisel's  work  that  P"(M.)    is   inversely  pro- 
portional   to  S2.1     Thus,   the  larger   |PH(Mj)   -  P"(M2)|,   the  larger  is 

is2  -  sf|. 

Conditions  2a  and  2b  of  Section  III -1  provide  necessary  and  suf- 
ficient conditions  for  V(BMMP)  >  V(BMSP).  The  conditions  are  that  if, 

say,  a2<  a2  then  P"  (M-j )  must  be  greater  than  .5  or  P"(M2)  must  oe 
2    2 

greater  than U-  .  Thus,  other  things  equal,  if  P"  ( M-j )  <  .5, 

(u2  -  U-,  ) 
then  the  larger  is  [P"(M2)  -  P"(M1)],  the  more  likely  it  is  that 

P"(M2)  satisfies  either  condition  2a  or  2b,  i.e.,  the  more  likely  it 

is  that  V(BMMP)  >  V(BMSP).  Of  course  if  P"^)  >  .5,  then  V(BMMP)  is 

greater  than  V(BMSP)  regardless  of  how  large  [P"(M.,)  -  P"(M2)]  is. 

The  phrase  "other  things  equal"  used  above  refers  specifically 

to  the  ratio  of  |a2  -  a2  |  to  (w2  -  u-,)  .  What  is  being  said  is  that 

given  two  model  selection  situations  in  which  the  absolute  value  of 

the  ratio  of  (a2  -  a2)  to  (v2  "  ^)2  is  the  same  in  both,  but  that  in 

the  first  situation  IP"^)  -  P"(M2)|  is  larger  than  it  is  in  the 


^See   Section  1 1 1. 2. 2. 

2The  "other  things"  are  clarified  in  the  next  paragraph. 
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second,  then  it  is  more  likely  in  the  first  situation  that 

V(BMMP)  >  V(BMSP). 

2 
This  claim  can  be  supported  from  another  angle.  Since  a.  is 

2  2 

proportional  to  S  • ,  it  can  be  said  that  the  smaller,  say  S-,  is  in 

2  2    2 

relation  to  S2,  the  more  likely  it  is  that  a,  <  a«.  By  the  Geisel 

2 
result  discussed  in  Section  III. 2. 2,  the  smaller  is  S-.  in  relation 

2 
to  S?,  the  larger  is  P"(M, )  in  relation  to  P"(M2).  Thus,  the  smaller 

2  2 

S,  is  in  relation  to  S2,  the  more  likely  it  is  that  the  model  with 

the  lower  predictive  variance  will  be  chosen  by  the  BMS  procedure. 

Therefore,  by  Theorem  1  of  Section  III.l,  the  more  likely  it  is  that 

V(BMMP)  is  greater  than  V(BMSP). 

There  is  a  special  forecasting  case  worth  noting  in  which  V(BMMP) 

is  greater  than  V(BMSP)  no  matter  which  model  the  BMS  procedure 

chooses.  It  is  a  result  of  the  following  lemma. 

2      2 

XF     ZF 
LEMMA  3:  If  — - —  =  — ! —  ,  then  the  model  with  the  lower  estimated 
n  0    n  0 

j=i  j  j=i  j 

2  2 

residual  variance  S • ,  also  has  the  lower  predictive  variance,  a.. 

PROOF:  Proof  of  this  lemma  follows  directly  from  the  definition 

2 
of  a^.  Recalling  (3.24)  and  the  model  space  assumptions  of  Section 

2     2 
III. 2. 4,  a,   and  a2  are  defined  as  follows: 

l)S*[   Xj: 

+  1  (3.25) 


(n  -  3)  1  n  0 


j=i J 
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2   (n  -  l)s|f  ll 

4=     (n-  3T   n 


X? 


Thus,  since  n,  the  sample  size,  is  a  constant,  and  — *>     is  assumed 

Z2 
F    2      2  2     2 

equal  to  — j   ,  a-,,  and  a~  are  proportional  to  S,  and  S?,  respectively. 

^  J 
Since  the  model  chosen  by  BMS  has  the  smaller  estimated  residual 

variance,  by  Lemma  3  it  also  has  the  lower  predictive  variance.  Thus  if 

Lemma  3  holds,  by  Theorem  2  of  Section  III.l  V(BMMP)  >  V(BMSP).  In 

this  special  case,  a  decision  maker  using  a  forecast  obtained  via  the 

BMS  procedure  would  be  making  a  decision  that  fails  to  recognize  the 

full  extent  of  the  uncertainty  involved  in  the  outcome  of  his  decision. 

Under  the  assumptions  of  Section  III .2.4,  Zellner  has  shown  that 

the  posterior  expected  value  of  the  residual  variance  of,  say,  Model  1 

,        (n  -  1)S, 

E"K>  ■  i?mr  •  (3-27) 

and  Raiffa  and  Schlaifer  have  shown  that  the  posterior  variance  of, 

say,  3,  is  „ 

'  (n  -  l)sf 

V"(S.,)  =  ^—   .  (3.28) 

(n  -  3)  I   X2 

j=l  J 

Thus,  recalling  (3.25),  the  predictive  variance  of  model  1  may  be 
written 


Zellner,  p.  62 


2 
Howard  Raiffa  and  Robert  Schlaiffer,  Applied  Statistical  Decision 
Theory  (Cambridge, Mass.:  The  M.I.T.  Press,  1961),  pp.  349-55 
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o]  =   V'^JX*  +  E"(^).  (3.29) 

The  following  lemma,  based  on  the  above  facts,  is  offered  to 
further  explain  the  relationship  between  V(BMMP)  and  V(BMSP): 

LEMMA  4:   If  E"(o2)  <  E"(a2),  £  X2  >  £  Z?,  and  Xc  <  Zc,  then 
e  j  =  i  J  -  j=1  J       f"  "  f 

V"(s1)  <  V"(B2)  and  a2  <  a2.1 

PROOF:  From  equation  (3.27)  it  can  be  seen  that  E"(o2)  and  E"(a2) 

2     2  2       2 

are  proportional  to  S,  and  So,  respectively.  Thus,  E"(a  )  <  E"(0 

2    2 
means  that  S-,  <  So-  From  (3.28)  it  can  be  seen  that  V"(g,)  and  V"  (Bo) 

n  2  n  9 

are  inversely  related  to  I   X.  and  £  1. ,  respectively.  Consequently, 

j=l  J    j=l  J 

if  S?  <  S2  and  £  X2  >  I  I2.,   it  can  be  seen  from  (3.28)  that 
1    c  j=l  J  "  j=l  J 

V'^g^  <  V"(B2).  Thus,  since  V'Cb-j)  <  V"(b2),  E"(a2)  <  E"(a2),  and 

2    2 
Xp  <  Zp,  it  follows  from  equation  (3.29)  that  o,  <  a0. 

If  the  conditions  of  Lemma  4  are  fulfilled,  the  model  selected 
by  the  BMS  procedure  will  have  the  lower  predictive  variance  and  by  The- 
orem 2  of  Section  III.l,  V(BMMP)  >  V(BMSP).  Thus,  as  is  the  case 
when  Lemma  3  holds,  a  decision  maker  using  a  forecast  obtained  via  the 
BMS  procedure  would  be  making  a  decision  which  fails  to  recognize  the 
full  extent  of  the  uncertainty  involved  in  the  outcome  of  his  decision. 


1  p     o  2        2 

S,  and  So  could,  of  course  be  substituted  for  E"(a  )  and  E"(aJ, 

respectively,  but  one  of  the  goals  of  this  lemma  is  to  explain  the 

relationship  of  V(BMMP)  and  V(BMSP)  via  the,  perhaps,  more  easily 

2   2         2      2 
interpretable  definition  of  a, :  o,  =  V"(g,)Xp  +  E"(a£). 
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The  next  section  examines  the  decision-maker's  posterior  expected 
losses  from  utilizing  BMS  and  BMC-generated  predictions  of  yf. 

III. 2. 7  A  Comparison  of  Expected  Losses 

Given  a  loss  function,  sample  y  values,  and  a  predictive  distri- 
bution of  y,  a  forecaster  can  find  an  optimal  point  estimate  for  y 
by  minimizing  the  decision-maker's  posterior  expected  loss: 

mjn  /  L(yp,y  )f(y  |y,D,DF)dy  (3.30) 

y  — 

It  is  well  known  that  if  a  quadratic  loss  function  is  used  in  (3.30), 
the  solution  to  the  minimization  problem  is  the  mean  of  f (yF|y,D,DF) . 
If  the  forecaster  chooses  to  forecast  via  the  BMS  procedure  he  would 
utilize  a  model  predictive,  f  [y^U  ,y,D.  ,Dpi ) ,  to  solve  (3.30).  The 
solution  to  (3.30)  and  his  point  estimate  for  yp  would  therefore  be 
the  mean  of  his  model  predictive,  u..   If  he  chooses  to  forecast  via 
the  BMC  procedure,  he  would  use  a  BMMP,  f (y  |y,D,DF) ,  to  solve  (3.30) 
and  his  solution  and  point  estimate  would  be  the  mean  of  the  BMMP,  v. 
As  has  been  mentioned  several  times  earlier  in  this  chapter,  however, 
a  forecaster  who  opts  for  forecasting  via  BMS  is  not  making  use  of 
all  the  available  information  about  yF<  The  Bayesian  Mixed  Model 
Predictive  (BMMP)  of  the  BMC  procedure  reflects  all  the  available 
information,  whereas  a  BMSP  is  merely  an  approximation  to  the  BMMP. 
Therefore,  the  appropriate  predictive  distribution  to  use  in  (3.30)  is 
a  BMMP.  Consequently,  the  optimal  solution  to  (3.30)  is  Vt   the  mean 
of  the  BMMP,  i.e.,  y  =  M.  Only  if  the  forecaster  and/or  decision 
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maker  assess  a  probability  of  one  for  a  particular  model   being  the 
true  model   of  yp's  process  would  a  single  model   predictive  provide 
full    information  to  the  forecaster  and/or  decision  maker  and,  hence, 
an  optimal    solution  to   (3.30). 

Since  the  appropriate  distribution  to  use  in  solving  (3.30)   is  a 
BMMP,   the  decision-maker's  posterior  expected  loss  using  a  BMS  fore- 
cast, y.,   is  greater  than  his  posterior  expected  loss  using  a  BMC 
forecast,   y: 

EL(y.)   =  /  L(yF,yi)f(yF|y,D,DF)dyF  >  EL(y) 

=  /  L(yF,y)f(yF|y,Di,DFi)dyF.  (3.31) 


This  follows  from  the  fact  that  it  is  y,  and  not  y.,  that  minimizes 

/  L(yF,y)f(yF|y,D,DF)dyF.  (3.32) 


When  P(M-)  >  0,i=l,2,  then  only  if  y,  =  y2  would,  say,  y-,  ,  minimize 
(3.32)  since  then  Hi   =  v2  =   P"(M,)y,  +  P"(M2)y2  =  u-  Of  course  if  for 
some  i  P(M.)  =  1,  then  p.  =  y  also.  But  in  the  context  of  this  disser- 
tation, this  case  is  of  no  interest. 


Let  C(BMC)  and  C(BMS)  stand  for  the  costs  required  to  forecast 

2  •  • 

with  BMC  and  BMS,  respectively.   Then,  assuming  that  the  decision 

maker's  loss  function  and  the  cost  functions  C(BMC)  and  C(BMS)  can  be 


^ote  that  when  P(M^  =  1 ,  the  BMSP  and  BMMP  distribution  are  the 
same. 

o 

In  general  C(BMC)  and  C(BMS)  cannot  be  computed  without  going 
through  the  actual  computations  required  by  the  BMC  and  BMS  pro- 
cedures. 
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meaningfully  compared,  if  experimentation  with  the  BMC  and  BMS  pro- 
cedures shows  that  in  general 

EK^)  -  EL(y)  >  C(BMC)  -  C(BMS), 

it  is  materially  as  well  as  theoretically  advantageous  for  the  fore- 
caster to  use  the  BMC  procedure  rather  than  the  BMS  procedure. 

Future  values  of  a  random  variable  are  typically  predicted  using 
point  or  interval  estimates.  The  implications  of  making  point  and 
interval  estimates  via  the  BMS  procedure  as  opposed  to  the  BMC  pro- 
cedure are  discussed  in  the  next  two  sections. 

III.2.8  Implications  for  Point  Estimation 

The  point  estimate  of  a  future  value  of  some  random  process  will 
be  denoted  by  yp.  The  use  of  loss  functions  to  determine  optimal 
point  estimates  was  discussed  in  the  preceding  section  of  this  chapter. 
If  a  loss  function  can  be  specified  by  the  forecaster  and/or  decision 
maker,  it  should  be  used  to  determine  yV.  Frequently,  however,  loss 
functions  are  too  costly  to  develop  and  predictions  must  be  made 
without  the  information  that  a  loss  function  provides.  In  such  cases 
forecasters  usually  examine  yF's  predictive  distribution  and  choose  a 
measure  of  its  central  tendency  as  their  estimate  of  yp.  Their  logic 
is  that  central  tendency  measures  are  usually  in  the  high  density  region 
of  the  distribution  and  will  not  err   significantly  even  if  the  actual 
yF  falls  in  a  tail  of  yF's  predictive  distribution.  Further,  it  is 
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well  known  that  commonly-used  loss  functions  often  result  in  mean, 
median,  or  modal  estimates  of  parameters. 

In  the  preceding  section,  it  was  noted  that  if  the  BMS  procedure 
and  a  quadratic  loss  function  are  utilized  for  forecasting,  yV  =  p.. 
However,  even  if  a  BMS  forecaster  does  not  have  a  loss  function  with 
which  to  work,  he  might  again  choose  the  mean  of  the  chosen  model  pre- 
dictive, u . ,  as  his  point  estimate  of  yF.  In  either  of  these  cases,  if 
the  BMS  procedure  chooses  M,  and  p,  f   u2,  then,  for  reasons  explained 
below,  it  can  be  said  that  the  forecaster's  point  estimate  is  inappro- 
priately high  or  low  with  probability  one.  For  example,  if  y,  <  m,  and 
y,  is  used  by  a  BMS  forecaster  to  predict  yF,  y,  is  said  to  be  an 
inappropriately  low  prediction  of  yF< 

Suppose  it  is  the  next  y  value  that  the  forecaster  would  like  to 
predict.  By  assessing  nonzero  model  probabilities  for  M,  and  M~,  as  is 
done  in  both  the  BMS  and  BMC  procedures,  the  forecaster/decision  maker 
is  acknowledging  that  he  believes  the  next  observation  could  be  gene- 
rated by  either  M,  or  M„.  A  prediction  of  the  next  yF  value  should 
acknowledge  this  uncertainty.  But  forecasting  procedures  that  utilize 
the  BMS  procedure  do  not  optimally  account  for  this  sort  of  uncertainty 
(model  specification  uncertainty)  because  they  do  not  appropriately 
reflect  the  possibility  that  a  rejected  model  may  be  the  true  model. 
Thus,  in  the  example  of  the  preceding  paragraph,  y,  is  said  to  be  an 
inappropriately  low  forecast  because  it  does  not  appropriately  reflect 
the  fact  that  yF  may  be  generated  by  M~. 

]Since  y  =  P"(M1)y1  +  P"(M2)y2  and  P"^),  P"(M2)  >  0,  V}   t   m2 
means  p,  f   p?  f   y. 
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Forecasts  made  utilizing  the  BMC  procedure  do  reflect  model 
specification  uncertainty,  n,  the  mean  of  the  BMMP  distribution  is 
an  example  of  a  BMC-generated  prediction.  As  can  be  seen  by  examining 
its  definition,  y  reflects  the  belief  that  yp  may  be  generated  by 
either  M,  or  M~: 

u  =  P"(M1)v]  +  P"(M2)u2- 

Since  the  decision  maker's  predictive  distribution  is  a  mixture  of 
the  model  preditives,  his  optimal  estimator  will  arise  from  the 
mixture  as  well,  and  in  this  case  will  be  p.   It  is  just  as  appropriate 
to  use  u  when  model  specification  uncertainty  exists  as  it  is  to  use, 
say,  u,  when  it  is  known  that  yr  will  be  generated  by  M-j . 

If  a  forecaster's  loss  function  is  asymmetric,  the  mean  of  yp's 
predictive  distribution  would  not  be  appropriate  for  forecasting  yp. 
Suppose  his  losses  are  best  represented  by  an  asymmetric  linear  loss 
function  and  model  specification  uncertainty  exists.  Then  his  optimal 
point  estimate  for  yp  would  be  a  fractile  of  yp's  BMMP  distribution. 
A  BMS  forecaster  utilizing  an  asymmetric  linear  loss  function  would 
use  a  fractile  of  the  BI1SP  distribution.   If  the  asymmetric  linear  loss 
function  describes  losses  from  underestimating  yp  as  being  greater  than 
losses  from  overestimating  yp,  the  BMS  forecaster's  point  estimate 
would  be  a  fractile  of  the  BMSP  distribution  which  is  greater  than  the 


If  the  linear  loss  function  were  symmetric,  the  optimal  point 
estimate  would  be  the  median  of  the  BMMP. 
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mean.   In  such  cases  the  BMS  forecaster  may  seriously  underestimate 
yF  and  incur  a  large  loss  while  thinking  is  is  protecting  against 
such  an  occurrence.  Suppose  y,  <  y,  and  the  BMS  procedure  selects  M, . 
Then,  if  the  forecaster  chooses  to  estimate  yF  with  a  fractile  of  M, 's 
BMSP  distribution  which  is  less  than  y,  say,  the  .7  fractile,  the  BMSP 
reflects  his  probability  of  underestimating  y,-  as  being  only  .3.  But, 
if  the  .7  fractile  of  the  BMSP  distribution  is  less  than  y,  the  BMMP 
distribution  reflects  his  probability  of  underestimating  yF  as  being 
greater  than  .5.  Thus,  a  BMS  forecaster  may  believe  he  is  protecting 
against  underestimating  yf   when  in  fact  he  has  a  higher  probability  of 
an  underestimate  than  an  overestimate. 

The  results  of  this  section  were  generated  via  a  comparison  of 
BMC  and  BMS  forecasts.  It  should  be  noted,  however,  that  point  esti- 
mates determined  by  any  procedure  which  utilizes  a  single  model  that 
has  been  selected  from  a  set  of  viable  models  will  typically  be  mis- 
placed. This  is  due  to  the  fact  that  use  of  a  single  model,  however 
selected,  has  the  effect  of  ignoring  information  provided  by  those 
remaining  models  which  have  positive  posterior  probability. 

III. 2. 9  Implications  for  Interval  Estimation 

The  procedure  of  predicting  that  a  future  value  of  a  random 

process  will  take  on  a  value  between  two  specified  real  numbers  with 

]Raiffa  and  Schlaifer,  p.  345,  have  shown  that  the  predictive 
distributions  for  yp  yielded  by  M,  and  M„  are  Student.  Since  the 

Student  distribution  is  unimodal  and  symmetric,  its  mean  and  median 
are  equal . 
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some  positive  probability  is  referred  to  as  Bayesian  interval  esti- 
mation. The  interval  represented  by  the  two  given  numbers  is  called 
a  credible  interval.  Often,  a  Bayesian  will  choose  as  his  credible 
interval  a  Highest  Posterior  Density  (HPD)  region.   Denoting  yp's 
predictive  distribution  as  f(yp|y),  an  interval  I  in  the  domain  of 

y,.  is  called  a  HPD  region  of  content  1  -  a  if 
J  F 

a)  P(yp  E  I)  =  1  -  a 

b)  ypl  e  I  and  yp2  I   I  implies 

f(yF1|y)  -  f(yF2!y)-2 

BMS  interval  forecasts  of  yp  are  determined  from  the  predictive 
distribution  of  yp  generated  by  the  model  chosen  by  the  BMS  procedure, 
i.e.,  a  Bayesian  Model  Selection  Preditive  (BMSP).  BMC  interval  fore- 
casts of  yp  are  determined  from  the  appropriate  Bayesian  Mixed  Model 
Predictive  (BMMP). 

Recall  that  under  the  assumptions  of  Section  III. 2. 4,  M1  and  M2 
define  unimodal ,  symmetric  distributions  (Student  distributions). 
Accordingly,  a  HPD  credible  interval  determined  from  M.-'s  BMSP  will  be 
centered  at  p..  Thus,  when  model  specification  uncertainty  exists  and 
M  f   u?,  the  midpoint  of  a  BMS  credible  interval  is  inappropriately 
high  or  low  in  the  same  sense  as  BMS  point  estimates  were  in  the 


Bayesian  methods  for  optimal  interval  estimates  exist  when,  as  in 
the  case  of  point  estimation,  appropriate  loss  functions  may  be  speci- 
fied. See  R.  L.  Winkler,  "Decision-Theoretic  Approach  to  Interval  Esti- 
mation," Journal  of  the  American  Statistical  Association,  6_7_  (1972), 
187-191. 

2George  E.  P.  Box  and  George  C.  Tiao,  Bayesian  Inference  in 
Statistical  Analysis  (Reading,  MA:  Addison-Wesley,  1973),  p.  123. 
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preceding  section.  The  discussion  of  this  phenomenon  with  respect  to 
point  estimates  in  the  preceding  section  applies  equally  well  here. 

Under  the  assumptions  that  M,  and  M?  are  normal  regression  models, 
M1  t  vv   and  P'CM,),  P'(M2)  >  0  (see  Section  III. 2. 4),  the  BMMP  distri- 
bution is  bimodal.  Accordingly,  an  HPD  BMC  credible  region  will  fre- 
quently consist  of  two  intervals;  one  with  midpoint  y, ,  the  other  with 
midpoint  y?.  Interval  forecasts  that  are  comprised  of  more  than  one 
interval  will  be  referred  to  as  split-interval  forecasts  or  split 
credible  intervals.  An  HPD  split  credible  interval  serves  to  warn  a 
decision  maker  that  it  is  highly  probable  that  yp  will  take  on  a  value 
in  one  of  two  or  more  noncontiguous  regions. 

The  following  two  lemmas  demonstrate  how  a  credible  interval 
formed  using  a  BMSP  can  be  misleading  when  model  specification  uncer- 
tainty exists.   In  Lemma  5,  the  intersection  of  the  BMSP's  of  M,  and 
M?  between  their  modes  is  referred  to  as  the  inter-modal  intersection. 
The  yF  value  that  corresponds  to  the  inter-modal  intersection  will  be 
denoted  yF- 

LEMMA  5:  Let  y,  f  v?   and  suppose  BMS  chooses  model  i.  If  the  length 
of  a  credible  interval  formed  using  the  BMSP  is  less  than  or  equal  to 

21 u-  -  yl\  ,   then  the  BMS  credible  interval  overstates  the  probability 

iMi    F1 

that  it  will  cover  yp. 

PROOF:  Recall  that  the  BMMP  is  a  mixture  of  predictive  distri- 


butions generated  by  M,  and  M«: 
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f(yF|y,D,DF)  =  P"(M1|y,D)f(yF|M1,y,D1,DF1) 


+  P"(M2|y,D)f(yF|M2,y,D2DF2) 

Thus, 

f(yF|M1,y,D1,DF1)  >  f(yp  |y,D,Dp). 


When  P"(M. |y,D)  f   1,  then 


f(yF|Mi,y,D.,DFi)  >  f(yF|y,D,DF). 
If,  say,  y,  <   y2  and  yF  <  yF,  then 

f(yF|M1,y,D1,Dn)  >  f(yF|y,D,DF). 

Thus,  the  probability  of  an  interval  centered  on  u-  of  length  less 
than  2|jj,  -  yF  |  containing  yp  is  greater  when  the  probability  is 
evaluated  via  f(yF|M, ,y,D,DF, ),  rather  than  f (yF|y,D,DF) . 

If  the  conditions  of  Lemma  5  are  fulfilled,  the  probability  of  a 
BMS  credible  interval  covering  yF  is  actually  smaller  than  claimed  by 
the  forecaster  using  the  BMS  credible  interval.  Thus,  the  BMS  credible 
interval  overstates  the  probability  of  yF  being  covered  and  therefore 
understates  the  risk  involved  in  using  the  interval  forecast  for 
decision-making  purposes.  Notice  that  since  f(yF|y,D,DF)  > 
f (y  |M, ,y,D, ,DF)  when  yF  >  yF,  it  is  unclear  whether  a  BMS  credible 
interval  of  length  greater  than  2|y,  -  yJ  understates  or  overstates 
the  probability  that  it  will  cover  yp. 

LEMMA  6:  If  p,  =  y«  and  the  BMS  procedure  chooses  the  model  with  the 
higher  (lower)  predictive  variance,  then  a  BMS  credible  interval  under- 
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states  (overstates)  the  probability  that  it  will  cover  yp. 

PROOF:  Theorem  3  of  Section  III.l  showed  that  if  the  BMS  proce- 
dure chooses  the  model  with  the  higher  preditive  variance,  then  V(BMMP) 
may  be  less  than,  greater  than,  or  equal  to  V(BMSP).  Recall  that 

V(BMMP)  =  a2  =  P"(M1|y,D)oJ  +  P"(M2|y,D)a2  +  P"  (M]  |y  ,D)  (U}    -  u)2 

+  P"(M2|y,D)(p2  -  M)2 


and 


E(BMMP)  =  p  =  P"(M1|y,D)li1  +  P"(M2|y,D)y2. 


Thus,  p,  =  p2  implies  that  p,  =  p«  =  v   and 

a2  =  P"(M1|y,D)a2  +  P" (Mg |y,D)o2. 

Therefore,  if  the  BMS  procedure  chooses  the  model  with  the  higher 
predictive  variance, V(BMMP)  <  V(BMSP).  Under  the  assumptions  of  this 
chapter,  a  BMSP  distribution  is  Student  and,  therefore,  unimodal  and 
symmetric.  Accordingly  a  95  percent  credible  interval,  say,  formed 
using  the  BMSP  distribution  will  be  wider  than  a  95  percent  credible 
interval  formed  using  the  BMMP  distribution.   It  follows  that  the 
probability  of  yp  being  covered  by  a  BMMP  (i.e.,  BMC)  credible  inter- 
val of  the  same  size  as  a  95  percent  BMS  credible  interval  is  greater 
than  .95.  Thus,  it  may  be  said  that  when  the  conditions  of  Lemma  6 
are  fulfilled,  a  BMS  credible  interval  understates  its  probability 
of  covering  y^. 

In  this  chapter,  it  has  been  shown  that  when  model  specification 
uncertainty  is  present,  the  appropriate  distribution  with  which  to 
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characterize  a  data-generating  process  is  the  BMMP  of  the  BMC  proce- 
dure. Failure  to  use  the  BMMP  when  model  specification  uncertainty 
exists  results  in  two  interesting  and  seemingly  contradictory  effects. 
First,  in  using  a  single  model,  however  selected,  information  provided 
by  the  remaining  models  which  have  positive  posterior  probability  is 
ignored.  Second,  in  ignoring  available  information  about  model  speci- 
fication uncertainty,  the  forecaster  behaves  in  many  cases  as  if  he 
is  facing  a  lesser  degree  of  uncertainty  than  is  actually  the  case. 
Thus,  the  forecaster  simultaneously  discards  relevant  information 
and  behaves  as  if  he  possesses  more  information  than  is  actually 
possessed.   This  phenomenon  was  noted  in  both  point  and  interval 
forecasting  situations. 

In  the  next  chapter,  the  BMC  procedure  is  applied  to  single- 
period  economic  control  problems. 


Christopher  B.  Barry  and  P.  George  Benson,  "Specification  Uncer- 
tainty in  Economic  Forecasting  and  Control  Models,"  University  of 
Minnesota,  Graduate  School  of  Business  Administration,  Working  Paper 
No.  35  (February,  1977),  p.  7. 


CHAPTER  IV 

MODEL  SPECIFICATION  UNCERTAINTY  IN  SINGLE-PERIOD 
ECONOMIC  CONTROL  PROBLEMS 


In  Chapter  III  the  consequences  of  forecasting  with  and  without 
considering  model  specification  uncertainty  were  examined.  Given  the 
existence  of  model  specification  uncertainty,  it  was  concluded  that  the 
BMC  procedure  was  an  appropriate  procedure  to  utilize  in  predicting 
future  values  of  a  random  process.  In  this  chapter,  the  BMC  procedure 
is  applied  to  single-period  economic  control  problems.  In  particular, 
the  BMC  procedure  will  be  used  to  find  both  certainty-equivalent  and 
optimal  analytic  solutions  to  single-period  control  problems.  In  both 
cases,  control  solutions  will  be  derived  which  take  into  consideration 
costs  that  may  be  incurred  by  a  controller  as  a  result  of  his  employing 
a  particular  instrument  (controllable  variable)  to  help  control  a 
random  process. 

By  using  the  BMC  procedure  to  solve  economic  control  problems, 
control  solutions  need  not  be  artificially  conditioned  on  the  assump- 
tion that  a  particular  econometric  model  is  in  fact  an  accurate 
characterization  of  the  process  whose  control  is  desired.  Instead,  a 
controller's  model  specification  uncertainty  is  reflected  in  his  con- 
trol solutions,  i.e.,  in  his  decisions  concerning  the  levels  or  rates 
at  which  to  set  his  controllable  variables.  By  explicitly  recognizing 
model  specification  uncertainty  and  including  it  through  the  BMC 
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procedure  as  part  of  the  economic  control  problem,  the  controller  is 
appropriately  specifying  the  risk  that  control  entails. 

In  the  following  section,  the  economic  control  problem  is  defined, 
references  to  previous  work  in  this  area  are  cited,  and  the  integra- 
tion of  the  BMC  procedure  and  the  economic  control  problem  is  discussed. 

IV. 1  The  Economic  Control  Problem 
The  problem  of  effecting  the  outcome  of  some  economic  data- 
generating  process  such  as  the  GNP,  rate  of  inflation,  or  unemployment 
rate,  is  referred  to  as  an  economic  control  problem.  More  specifically, 
given  an  econometric  model  of  the  data-generating  process  of  interest, 
a  single-period  economic  control  problem  involves  determining  settings 
for  the  model's  instruments  --  controllable  variables  --  in  one  time 
period  such  that  in  the  next  time  period  the  model's  dependent  variable 
--  a  desideratum  or  policy  objective  --  is  close  to  a  specified  target 
value  or  within  a  specified  target  interval.   Controlling  values  for 
the  model's  instruments  are  determined  by  optimizing  an  objective  or 
criterion  function  that  is  typically  a  function  of  the  difference 
between  the  target  and  realized  values  of  the  dependent  variable. 
An  economic  control  problem  may  be  expressed  as: 

min  /  L(y  -  y*)f(y|X)dy  (4.1) 

X  — 


When  a  control  of  a  dependent  variable  is  desired  over  more  than 
one  time  period,  the  problem  is  referred  to  as  a  multiperiod  control 
problem.  For  a  discussion  of  multiperiod  control,  see  Zellner, 
336-354. 
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where  y  is  the  dependent  variable  whose  control  is  desired,  y*  is  the 
value  or  target  the  controller  would  like  y  to  attain  next  period,  and 
X  is  the  model's  vector-valued  instrument.  L(y  -  y*)  is  a  loss  func- 
tion that  describes  the  losses  incurred  by  the  controller  as  a  result 
of  y  not  equalling  the  target,  y*,  or  not  falling  in  the  target  inter- 
val.  The  loss  values  may  be  viewed  as  opportunity  losses  or  "social 
costs".  The  function  f(y|X)  is  the  predictive  distribution  of  future 
values  of  y  as  determined  by  the  econometric  model  used  to  characterize 
y.  Thus,  in  this  case  control  of  y  is  effected  by  setting  X  in  the 
current  time  period  so  as  to  minimize  next  period's  expected  loss. 

If  sample  information  about  the  process  is  available,  the  control 
problem  is  still  solved  by  minimizing  expected  loss,  but  the  expecta- 
tion is  taken  with  respect  to  a  predictive  distribution  that  reflects 
the  sample  information.  Letting  y  and  X  now  refer  to  observed  data 
points  and  yF  and  XF  refer  to  the  control -period  values  of  the  target 
and  control  variable,  respectively,  the  problem  becomes 

min  /  L(yp  -  yp*)f (yF |y,X,XF)dyp.        (4.2) 


For  a  discussion  of  the  sensitivity  of  control  to  the  form  of 

the  loss  function,  see  Arnold  Zellner  and  Martin  Geisel,  "Sensitivity 

of  Control  to  Uncertainty  and  Form  of  the  Criterion  Function,"  in 

D.  G.  Watts  (ed.),  The  Future  of  Statistics  (New  York:  Academic  Press, 
1968),  269-283. 
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In  the  single-period  control  problem  described  above,  it  is 
assumed  that  the  controller  knows  the  correct  econometric  model  or 
random  process  he  wishes  to  control.  Accordingly,  in  solving  his 
control  problem,  the  controller  has  only  to  contend  with  parameter  and 
residual  uncertainty,  not  model  specification  uncertainty.  Much  work 
has  been  done  on  such  problems  by,  for  example,  Fisher,  Brainard, 
Leland,  Basu,  and  Zellner.   The  more  complicated  multiperiod  control 
problem,  in  which  it  is  assumed  that  the  controller  knows  the  correct 
econometric  model  of  the  process  he  desires  to  control,  has  also  re- 
ceived attention.  See,  for  example,  Aoki ,  Prescott,  Zellner,  Taylor, 
and  Chow.   The  approaches  to  single  and  multiperiod  control  of  the 


For  a  more  complete  discussion  of  control  problems,  see  Zellner, 
319-359. 

Walter  D.  Fisher,  "Estimation  in  the  Linear  Decision  Model," 
International  Economic  Review,  3  (January,  1972),  1-29.  William 
Brainard,  "Uncertainty  and  the  Effectiveness  of  Policy,"  American 
Economic  Review,  5_7_  (May,  1967),  411-425.  H.  Leland,  "The  Theory  of 
the  Firm  Facing  Uncertain  Demand,"  American  Economic  Review,  62_  (1972), 
278-291.  A.  Basu,  "Economic  Regulation  Under  Parameter  Uncertainty," 
(Ph.D.  dissertation,  Economics  Department,  Stanford  University,  1973). 
Arnold  Zellner,  An  Introduction  to  Bayesian  Inference  in  Econometrics 
(New  York:  John  Wiley  and  Sons,  1971),  319-336. 

3Masanao  Aoki,  Optimization  of  Stochastic  Systems  (New  York:  Aca- 
demic Press,  1967);  Edward  C.  Prescott,  "Adaptive  Decision  Rules  for 
Macro  Economic  Planning"  (Ph.D.  dissertation,  Graduate  School  of  Indus- 
trial Administration,  Carnegie-Mellon  University,  1967);  Edward  C. 
Prescott,  "The  Multi-Period  Control  Problem  Under  Uncertainty,"  Econo- 
metrica,  40  (November,  1972),  1043-58;  Zellner,  pp.  336-54;  John  B. 
Taylor,  "Asymptotic  Properties  of  Multiperiod  Control  Rules  in  the 
Linear  Regression  Model,"  Institute  for  Mathematical  Studies  in  the 
Social  Sciences,  Stanford  University,  Technical  Report  No.  79,  December, 
1972;  Gregory  D.  Chow,  "Effect  of  Uncertainty  on  Optimal  Control  Poli- 
cies," International  Economic  Review,  1_4  (October,  1973),  632-645; 
Gregory  C.  Chow,  "A  Solution  to  Optimal  Control  of  Linear  Systems  with 
Unknown  Parameters,"  Econometric  Presearch  Program,  Princeton  University, 
Research  Memorandum  No.  157,  December,  1973. 
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above-mentioned  authors  are  theoretically  appropriate  only  if  the 
controller  can  assert  with  probability  one  that  the  model  he  has 
chosen  to  represent  the  process  whose  control  is  desired  is  in  fact 
the  correct  representation  of  the  process.  If  the  controller  can  make 
such  a  statement,  then  in  solving  his  control  problem  he  only  has  to 
contend  with  the  model's  parameter  and  residual  uncertainty.  If,  how- 
ever, he  specifies  the  chosen  model's  appropriateness  with  a  model 
probability  less  than  one,  he  is  acknowledging  the  existence  of  model 
specification  uncertainty.  Theoretically,  if  model  specification 
uncertainty  exists,  it  should  be  dealt  with  in  control  problems.  It 
should  not  be  ignored  or  assumed  away  via  some  model  selection  proce- 
dure such  as  Bayesian  Model  Selection.   Control  procedures  that  fail 
to  consider  model  specification  when  it  exists  are  not  optimal  proce- 
dures. Such  procedures,  in  the  sense  of  Chapter  III,  misspecify  the 
uncertainty  involved  in  controlling  y,  and  therefore,  the  risk  faced  by 
the  controller  in  using  them  to  set  the  rate  or  level  of  his  instru- 
ments. 

Model  specification  uncertainty  has  not  been  explicitly  considered 
in  the  control  literature.  Since  it  may  have  an  impact  upon  optimal 
control  solutions,  it  merits  consideration.  That  the  consideration  of 
model  specification  uncertainty  in  control  contexts  is  important  and  warrants 


For  a  discussion  of  several  model  selection  procedures  that  are 
frequently  used  to  establish  econometric  models  of  processes  whose 
control  is  desired,  see  Gaver  and  Geisel,  "Discriminating  Among  Alter- 
native Models:  Bayesian  and  iion-Bayesian  Methods,"  pp.  49-77. 
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a  great  deal  of  attention  has  been  expressed  by  Pierce 


Another  area  of  uncertainty  has  to  do  with  our 
models.  I  want  to  stress  this  because  users  of 
control  theory  often  tend  to  take  models  as  given 
and  work  out  solutions  without  seriously  ques- 
tioning the  reasonableness  of  the  models.  This 
tendency  is  not  very  harmful  when  one  is  working 
on  technique.  However,  there  is  a  real  danger 
of  giving  more  credence  to  model  results  than 
they  deserve,  especially  if  a  particular  policy 
trajectory  is  highly  influenced  by  the  choice 
of  a  model . l 


He  goes  on  to  say: 


The  problem  lies  not  with  uncertainty  concerning 
the  true  value  of  the  model  parameters,  but  also 
with  the  structure  of  the  models  themselves.2 


By  utilizing  the  Bayesian  Model  Comparison  procedure  to  develop 
a  Bayesian  Mixed  Model  Predictive  distribution  for  the  process  whose 
single-period  control   is  desired,  a  controller  can  determine  settings 

for  his   instruments   in  light  of  residual,  parameter,  and  model    speci- 

3 
fication  uncertainty.       When  single-period  control    is  desired,   the 

solution  to  the  following  minimization  problem  provides  optimal 

settings   for  the  controller's   instruments,   D^: 


min  /  L(yF,yF*)f(yF|y,D,DF)dyF.  (4.3) 


J.  L.  Pierce,  "Quantitative  Analysis  for  Decisions  at  the  Federal 
Reserve,"  Annals  of  Economic  and  Social  Measurement,  _3  (1974),  1-9. 

2Ibid. 

3 
That  a  BMMP  in  fact  reflects  model  specification  uncertainty  was 

discussed  in  Chapter  III. 
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Recall  that  D  =  (X,Z)'  and  Dp  =  (Xp.Zp)'.  The  function  f{yf\y,D,Df) 
is  the  controller's  BMMP  for  the  data-generating  process.  All  other 
terms  in  (4.3)  are  as  previously  defined.  The  only  difference  between 
(4.3)  and  (4.1)  or  (4.2)  is  the  use  of  a  Bayesian  Mixed  Model  Predic- 
tive in  (4.3)  rather  than  a  predictive  distribution  determined  from  a 
single  model.   Since  all  relevant  major  forms  of  uncertainty,  resi- 
dual, parameter,  and  model  specification  uncertainty  are  reflected  in 
(4.3)  and,  therefore,  influence  its  solution,  it  is  said  that  (4.3) 
provides  optimal  settings  for  Dp. 

In  the  next  section  of  this  chapter,  assumptions  are  presented 
under  which  various  single-period  control  solutions  are  obtained  using 
the  BMC  procedure  in  the  remainder  of  the  chapter. 

IV. 2  Model  Space  and  Assumptions 
In  the  remainder  of  this  chapter,  solutions  will  be  derived  for 
single-period  control  problems  based  on  the  following  assumptions: 
1.  The  decision  maker  (controller)  believes  that  one  or 
the  other  of  the  following  two  models  is  an  accurate 
representation  of  a  data-generating  process  to  be  con- 
trolled, but  he  is  unsure  which  one  is  correct: 
M^  y  =  B-,  X  +  e  ; 

M2:  y  =   B2Z  +  6.  (4.4) 

y  is  the  target  variable,  and  X  and  Z  are  two  dif- 
ferent nonrandom  explanatory  variables,  instruments 


In  what  follows,  control  problems  that  deal  with  a  single  model 
will  be  expressed  as  in  (4.1)  or  (4.2). 
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over  which  the  controller  has  complete  control.  B-,  and 
B2  are  unknown  parameters,  e  and  5  are   the  usual  normally 

distributed  error  terms,  each  with  zero  mean  and  unknown 

2     2 
variance,  a  and  a  ,  respectively.  It  is  also  assumed  that 

cov(e,  ,e)  =  cov(e2,S)  =  cov(e,6)  =  0.  Thus,  M,  and  M„  are 

normal  univariate  regression  models  which,  to  keep  the  number 

of  each  model's  unknown  parameters  at  two,  have  been  forced 

through  the  origin. 

2.  The  data-generating  process  over  which  control  is  desired 
is  stationary. 

3.  X  and  Z  are  uncorrelated,  and  only  the  controllable  variable 
in  the  true  model  affects  y.  Thus,  if  M,  were  the  true 
model,  B?  would  be  zero.  If  neither  M,  nor  no  were  the  true 
model,  it  may  be  that  B-,  =  B?  =  0. 

4.  The  controller's  loss  function  is  a  quadratic  loss  function 
of  the  form 

L(yF,yF*)  =  K(yp  -  yp*)2 

where  K  is  a  constant.   In  what  follows,  K  is  set  equal  to 

one  without  loss  of  generality. 
Aside  from  the  change  in  emphasis  from  forecasting  to  control,  and  the 
assumption  that  X  and  Z  are  controllable  variables,  the  above  assump- 
tions are  similar  to  those  under  which  the  Bayesian  Model  Comparison 
and  Bayesian  Model  Selection  procedures  were  compared  in  Chapter  III 
(see  Section  III. 2. 4). 

In  the  next  section,  certainty-equivalent  solutions  to  single 
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period  control  problems  will  be  derived  under  the  above  assumptions 
with  the  use  of  the  BMC  procedure. 

IV. 3  Single-Period  Certainty-Equivalent  Control 
If  in  attempting  to  control  y's  value  next  period,  the  controller 
behaves  this  period  as_  if  E(y)  is  the  value  of  y  that  will  occur  with 
certainty  next  period,  then  E(y)  is  said  to  be  a  "certainty  equivalent" 
for  y.   When  the  process  which  generates  y  is  known  or  assumed  to  be 
known,  the  single-period  control  problem  under  parameter  and  residual 
uncertainty  is  reduced  to  a  deterministic  problem.   If  the  process 
which  generates  y  is  not  known,  but  is  believed  to  be  best  represented 
by  one  of  N  alternative  models,  the  single-period  control  problem  under 
model  specification,  parameter,  and  residual  uncertainty  reduces  to  one 
of  control  under  model  specification  uncertainty  alone.  In  this  sec- 
tion, single-period  control  solutions  are  derived  for  the  controller 
who  admits  to  model  specification  uncertainty  and  behaves  as  if  E(y) 
will  occur  with  certainty  next  period. 

The  use  of  the  certainty-equivalent  E(y)  for  y  reduces  models  1 
and  2  of  Section  IV. 2  to  the  following: 

PC,:  E^fjO-E^JX  +  E.O.iyXi 

(4.5) 


For  a  definition  and  discussion  of  certainty  equivalence,  see 
Herbert  A.  Simon,  "Dynamic  Programming  Under  Uncertainty  with  a  Qua- 
dratic Criterion  Function,"  Econometrica,  24  (1956),  74-81;  and/or 
C.  Holt,  J.  F.  Muth,  F.  Modigliani,  and  H.  A.  Simon,  Planning  Produc- 
tion, Inventories  and  Work  Force  (Englewood  Cliffs,  N.J.:  Prentice- 
Hall,  1960),  Chapter  6. 
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In  (4.5)  the  parameter  and  residual  uncertainty  of  M,  and  M?  are 
treated  as  if  they  do  not  exist.  Thus,  if  neither  M,  nor  M„  is  known 
or  assumed  to  be  the  true  model,  it  is  only  necessary  to  deal  with 
model  specification  uncertainty. 

From  (4.2),  assuming  that  M,  is  the  true  model,  the  single-period 
control  problem  is  solved  by  determining: 

min  /  L(yF,y  *)f(yJy,X,XFr)dyF  =  min  E       L(yF,y  *).    (4.6) 
XF  -  h  XF   yFly'X'XF 

The  solution  to  (4.6)  yields  the  controller's  minimum  expected  loss 

under  M, 's  predictive  distribution  of  yp.   If  the  loss  function  is 

quadratic,  as  is  assumed  for  the  remainder  of  this  chapter,  (4.6) 

becomes: 

The  use  by  the  controller  of  E  iv  y  x  ^yF^  as  a  certainty  equivalent 
for  yF  reduces  (4.7)  to  the  following: 

min  [E  ,       (y  )  -  y  *]2.  (4.8) 

Xp    ypiy'X'X'F  h     h 

Note  that  (4.8)  contains  no  random  terms.  Thus,  (4.8)  is  minimized  by 
the  value  of  XF  that  sets  Ew  ,  Y  v  (yc)  equal  to  y*.     From  (4.5)  it 

f  yp|y>x,XpwF'    ^  jf 

can  be  seen  that  Ey  ly^j   (yp)  =  Eg  ■  (b-j  )Xp  +  E£,  (e)  =  b^'Xp.  Thus 
The  appropriate  setting  for  Xp  is  one  such  that  b,"X|:  =  y*.  According- 


bl 
equivalent  solution  when  it  is  assumed  that  model  1  generates 


ly,  Xp  should  be  set  equal  to  £-  .  This  is  the  single-period  certainty- 
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yp.   Similarly,  when  model  2  is  assumed  to  generate  yF,  the  single- 

v*  ? 
period  certainty-equivalent  solution  is  to  set  Z  equal  to  f— „. 

b2 

Single-period  certainty-equivalent  control    solutions   in  which 

a  particular  model    is  assumed  to  generate  yF  are  derived  assuming  the 
mean  of  yF's  predictive  distribution  is  the  value  of  yF  that  will 
occur  with  certainty  next  period.     This   is  equivalent   to  assuming  that 
3-,   =  b, "  and  e  =  0.     yF's  predictive  variance  is   ignored  in  the  cer- 
tainty-equivalent solution.     Consequently,   such  solutions  are  not 

optimal   but  are  only  approximations  to  optimal    solutions,   as  explained 

3 
by  Zellner.        In  general,   since  certainty-equivalent  control   problems 

ignore  yF's  predictive  variance  and,   therefore,   parameter  and  residual 
uncertainty,   their  solutions  are  much  easier  and  less  costly  to  obtain 
than  are  optimal   control    solutions.     Consequently,   certainty-equivalent 
control   may  at  times  provide  the  controller  with  an  attractive  alter- 
native to  full-scale  optimal   control. 

IV. 3.1     Certainty-Equivalent  Control   Using  the  BMMP  Distribution 

By  using  the  Bayesian  Model   Comparison  procedure's  Bayesian  Mixed 
Model    Predictive  as  yF's  predictive  distribution,   single-period  cer- 
tainty-equivalent control    solutions  can  be  derived  which  reflect  the 
controller's  model   specification  uncertainty  concerning  M,   and  M2  of 


This  solution  can  also  be  found  in  Zellner,   pp.   320-322. 

2 
Notice  that  these  certainty-equivalent  solutions  make  the  control 

target,  y*,   the  mean  of  yF's  predictive  distribution. 
3Zellner,   pp.    322-324. 
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the  previous  section.     This  approach   to  certainty-equivalent  control 
also  does  not  explicitly  consider  parameter  and  residual    uncertainty 
and  is,   therefore,  also  suboptimal.      But  by  enabling  the  controller  to 
solve  his  control   problems   in   light  of  any  model    specification  uncer- 
tainty,  this  approach  may  improve  the  effectiveness  of  certainty- 
equivalent  control   solutions.     As  will   be  seen  below,   single-period 
BMC  certainty-equivalent  control,   as   it  will    be  called,   requires  little 
more  computational   effort  than  the  certainty-equivalent  control    solu- 
tions derived  above  in  which  specification   uncertainty  was  not  treated. 
The  BMC  certainty-equivalent  control   solution  can  be  obtained  from 
the  full-scale  BMC  control    problem  of  (4.3).      (4.3)   is  repeated  here 
and  the  BMC  certainty-equivalent  control    solution   is  derived  below: 

min  /   L(y    ,y  *)f(y    |y,D,DF)dyF.  (4.9) 

Recall  from  (3.1)  that  for  the  two-model  case  the  BMMP  distribution 
would  be  expressed  as 

f(yF|y,D,Dp)  =  P"(M1|y,X)f(yF|M1,y,X,XF) 

+  P"(M2|y,Z)f(yF|M2,y,Z,ZF).        (4.10) 

In  this  case,  D  and  DF  are  vectors  of  control  variables:  D  =  (X,Z)' 
and  Dp  =  (Xp,Zp)'.  Accordingly,  (4.9)  may  be  written 

min  [P'-a^ly.X)  /  L(yF,yF*)f (yp | M]  ,y,X,Xp)dyF 
DF 

+  P"(M2|y,Z)  /°L(yF,yF*)f(yF|M2  y,Z,Zp)dyF].     (4.11) 


30 

Under  the  assumption  that  the  loss  function  is  quadratic,  (4.11)  may 
be  rewritten 

min  [P"(M,|y,X)Ey(:|HiiyiXiXF(yF-yF*)2 

+  P"("2ly.z>EyF,M2,y,z,zF<yF  -  yF*>2].  (4.12) 

The  use  by  the  controller  of  Ew  iM  „  v  Y  (yr)  =  C,  and 

Jr  I '  h  >y  >  A  'Ap   r       ' 

En.         -7  -7   (yr)   =  C0  as  certainty  equivalents  for  yc  in  M,   and  M9, 
uyF|M2,y,Z,ZpVJ,Fy  2  J  JF  1  2 

respectively,  means  that  yr  is  no  longer  treated  as  being  random. 

Consequently,   (4.12)   reduces  to 

min  [P"(M1|y,X)(C1    -  yp*)2  +  P"(M2|y,Z)(C2  -  yp*)2].         (4.13) 
DF 
Because  the  right-hand  term  inside  the  brackets  of   (4.13)   is  not  a 

function  of  Xrl   and  the  left-hand  term  is  not  a  function  of  Zr,   the 

vector  optimizing   (4.13),   Dp*,  may  be  found  by  minimizing  each  of  the 

terms  within  the  brackets  separately.     Thus,   in  order  to  find  Dp*, 

the  single-period  BMC  certainty-equivalent  control    solution,   the 

following  two  problems  must  be  solved: 

min   P"(M1|y,X)(C1    -  yp*)2  (4.14) 

and  F 

min  P"(MJy,Z)(C?  -  y  *)2.  (4.15) 

ZF 

Noting  that  P"(M,|y,X)  is  not  a  function  of  Xp,  and  that  P"(M2|y,Z)  is 
not  a  function  of  Zp,  (4.14)  and  (4.15)  reduce  to  the  following: 
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min  (C,  -  y*)2   ;  (4.16) 

x 

min  (C?  -  yF*)2  .  (4.17) 

ZF 

Notice  that  (4.16)  is  the  same  as  (4.8).  Thus,  for  example,  in  order  to 
solve  (4.16)  Xp  should  be  set  equal  to  £-„.  Thus,  Df*   =  (^-„,  ^-„).  In 

words,  the  BMC  certainty-equivalent  control  solution  is  to  set  Xp  as  if 
M,  were  in  fact  the  model  generating  yf  and  to  set  Zp  as  if  M„  were  the 
true  model . 

The  rationale  behind  the  BMC  control  solution  is  that  since  the 
controller  is  unsure  of  which  of  the  two  control  instruments,  Xp  or 
Zp,  affects  yF,  and  since  he  believes  that  only  one  of  them  actually 
affects  y,  he  should  use  them  both  fully  in  attempting  to  attain  yF*. 
Due  to  the  restrictive  assumptions  under  which  it  was  derived,  this 
solution  is  somewhat  unrealistic.  A  more  realistic  solution  would 
account  for  the  possibility  that  (1)  costs  might  be  incurred  for  the 
use  of  an  instrument,  especially  for  the  use  of  an  inappropriate 
instrument;  (2)  both  instruments  might  affect  yF;  (3)  the  instruments 
interact  in  some  manner;  and/or  (4)  the  process  generating  y  may  be 
nonstationary.  The  first  of  these  more  realistic  cases  will  be 
discussed  with  respect  to  optimal  BMC  control  in  Section  IV. 4.  At 


For  a  solution  to  how  to  account  for  the  cost  of  changing  the 
setting  of  an  instrument  in  the  optimal  single-period  control  problem 
in  which  a  particular  model  is  assumed  to  generate  yF,  see  Zellner, 
pp.  324-325.  h 
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that  time,  the  appropriate  optimal  BMC  and  BMC  certainty-equivalent 
control  solutions  for  various  cases  in  which  instrument  use  costs  are 
involved  will  be  derived.  Case  (2)  above  is  discussed  in  Section 
IV. 4. 5,  and  an  approach  to  case  (4)  is  discussed  in  Chapter  V. 

IV. 3.2  Risk  Specification  in  Certainty-Equivalent  Control 

Even  though  a  controller  may  behave  as  though  the  expected  value 
of  yF  is  certain  to  occur  next  period,  he  should  not  ignore  the  risk 
involved  in  his  choosing  to  do  so.  This  risk  may  be  represented  by 
y's  predictive  variance.  The  larger  y's  predictive  variance,  the 
more  likely  that  L(yF,yF*)  =  (yF  -  yF*)2  will  be  large.  Thus,  the 
controller  can  use  y's  predictive  variance  as  a  measure  of  the  risk 
involved  in  his  attempt  to  attain  yF*.  If  the  risk  appears  too  great, 
the  controller  may  choose  a  different  control  method,  perhaps  optimal 
single-period  control  (discussed  in  Section  IV. 4),  since  it  considers 
the  size  of  yF's  predictive  variance  in  determining  settings  for  the 
controller's  policy  instruments. 

If  the  controller  knows  that  a  particular  model,  say  M-j ,  will 
generate  yp  recalling  (3.24),  yF's  predictive  variance  and  a  measure 
of  the  risk  being  taken  by  the  controller  is 


2   (n  -  1)S^ 
°1  =  -(TT^TT 


.th 


+  1 


V  z 


(4. IS) 


2  , 


where  x.  is  the  i   sample  observation  of  X.  Notice  that  a]  is  a  function 

of  the  controller's  instrument  setting,  Xp.  Consequently,  since  the  con- 

.  2 
trol  method  chosen  affects  Xp  it  also  influences  the  size  of  a] . 

In  the  case  of  certainty-equivalent 
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control,  XF  =  rrsr   ,  and  the  predictive  variance  is 


2   (n  -  1)S^ 
°1  =  (n  -  3) 


[4.19) 


If  the  controller  acknowledges  model  specification  uncertainty 
and  chooses  to  control  via  BMC  certainty-equivalent  control,  then, 
recalling  (3.12),  yF's  predictive  variance  is 


r2  =  P"(M1|y,X)a2  +  P"(M2|y,Z)a2  +  P" (M] |y ,X) (^  -  y)2 


+  P"(M2|y,Z)(y2  -  uT  . 


[4.20) 


w,  is  the  mean  of  yF's  predictive  distribution  as  characterized  by 
model  i,  and  p  is  the  mean  of  the  BMMP  distribution  for  yp. 

Equation  (4.18)  provides  an  appropriate  risk  measure  only  if  the 
controller  is  certain  that  a  particular  model  will  generate  yp.  If  he 
utilizes  the  BMS  procedure  to  choose  a  model  for  yp,  he  is  acknow- 
ledging that  he  is  uncertain  of  the  form  of  the  process  generating 

Yr-  Consequently,  (4.18)  is  not  an  appropriate  measure  of  his  risk. 

2 
If  the  BMS  procedure  chooses,  say,  M, ,  a,  understates  the  risk  in- 
volved in  his  attempt  to  attain  yp*.  The  following  lemma  is  needed 
to  prove  this  statement. 


LEMMA  7.  Let  n  >  3.  When  BMS  and  the  max-R  rule  provide  equivalent 
methods  for  choosing  between  M,  and  M?  (see  Section  III. 2. 2),  and 
single-period  certainty-equivalent  control  is  applied  to  the  model 
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chosen  by  BMS,  say,  model  1,  then  V(BMSP)  =  o*   <  V(BMMP)  =  a  . 

PROOF:  Suppose  the  BMS  procedure  chooses  M, ,  and  M,  is  used  to 

yF* 

control   yF.     The  certainty-equivalent  control    solution  is  Xp  =  g-n-  . 
Accordingly,   o2,    is  as   shown  in   (4.19).     Raiffa  and  Schlaifer  show  that 


b  "  =  ^ 
Dl     N 


N 


A* 


where  x.  and  y.  may  be  the  ith  sample  observation  of  X  and  y,  or 
reflect  prior  information  about  8,  in  a  form  equivalent  to  sample 
observations.   Substituting  for  ty  in  (4.19)  yields 


2   (n  -  1)SJ 


'1     n  -  3 


2,..  +2 


\l^r 


+  1 


(,J,v, 


2 

M.'s  estimated  residual  variance,  S, ,  is,  by  definition, 
l  l 


(4.21 


Q2  _  i=l 

bl  "      n  -  1 


n  9    n     n 
y  y  -(  y  x.y.) 

n  -  1 


(4.22) 


2   P2  • 

Thus,  a  necessary  and  sufficient  condition  for  S^  -  S2  is 


1  Howard  Raiffa  and  Robert  Schlaifer,  Applied  Statistical  Decision 
Theory  (Cambridge,  Mass.:  M.I.T.  Press,  1961),  p.  343. 
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n     9    N     9 
Ux-y.)2  (^z.y.)2 


(4.23) 


i=l  1 


1-1  1 


t  h  7  7 

where  z.  is  the  i   sample  observation  of  Z.  Accordingly,  if  S,  =  S,,. 


2       2 
then  (W  =  ^z^   and,  noting  a2|s  definition  in  (4.21),  a? 

Ix      Iz 


2  _  2 

"  a2. 


3a- 


'1  2 

If  it  can  be  shown  that  — p  >  0,  then  it  can  also  be  said  that  a-,  > 

3Sf 

2      2    2        ^°1 
o0   when  S-,  >  S9.  That  — 5-  >  0  is  demonstrated  in  the  next  paragraph, 


Noting  that 


ilxyj 


Ix 


=  [£y2  -  (n  -  i)s2]  >  0, 


(4.24) 


[4.21)  can  be  rewritten 


2   (n  -  1)S* 
Jl  =  (n  -  3) 


Qy2  -  (n  -  1)S2] 


+  1 


:4.25) 


Taking  the  partial  derivative  of  (4.25)  with  respect  to  S,  yields 


3o2  (n  -   l)yF*2 


>S2       (n  -  3)[}>2  -    (n  -   1)S2] 


(n  -  l)2S2yF*2 


(n  -  3)[[y2  -   (n  -   1)S2]2 


+   (n   '   \\    .  (4.26) 
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By  (4.24),  the  denominator  and,  therefore,  the  entire  first  term  on  the 
rhs  of  (4.26)  is  positive  when  n  >  3.  The  second  and  third  terms  on 

the  rhs  of  (4.26)  are  also  obviously  both  positive  if  n  >  3.  Conse- 

9a-i 
quently,  — j   >  0. 

3S1 


mode 


Under  the  conditions  of  this  lemma,  the  BMS  procedure  selects  the 

2         2  2    2      2    2 

1  with  the  lower  S  (higher  R  ),  Thus,  since  a,   =   o?  when  S,  =  S?, 

2    2      2    2 
and  a,  >  <^o  when  si  >  s?'  the  BMS  procedure  also  selects  the  model  with 

the  lower  predictive  variance.  Recall  Theorem  2  of  Chapter  III  in 

which  it  was  shown  that  if  the  BMS  procedure  chooses  the  model  with 

the  lower  predictive  variance,  then  V(BMMP)-  V(BMSP).  Accordingly, 

by  Theorem  2,  the  desired  result  is  obtained. 

If  model  specification  uncertainty  exists,  the  BMMP  distribution 

of  the  BMC  procedure  is  the  appropriate  distribution  with  which  to 

characterize  yF;  any  other  procedure  for  determining  the  predictive 

distribution  will  fail  to  include  relevant  information.  Accordingly, 

when  model  specification  uncertainty  exists,  the  appropriate  measure 

2      2  2  . 

of  the  controller's  risk  is  a   ,  not  a..     As  shown  in  Lemma  7,  o.   is 

2 
less  than  a  and  therefore  understates  the  controller's  risk. 

In  this  section,  certainty-equivalent  solutions  have  been  con- 
sidered, but,  certainty-equivalent  solutions  are  not  fully  optimal  in 


Recall  that  Theorem  2  showed  that  V(BMMP)  >  V(BMSP).   It  was 
noted,  however,  that  V(BMMP)  =  V(BMSP)  only  when  one  or  the  other  of 
p'(M, )  and  P'(Mo  equalled  one.  But  neither  of  these  cases  involve 

model  specification  uncertainty  and,  therefore,  are  not  of  interest 
in  this  dissertation.  Therefore,  under  the  conditions  of  Lemma  7, 
V(BMMP)  >  V(BMSP)  in  cases  of  interest. 
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general.  In  the  next  section,  optimal  single-period  BMC  control 
solutions  will  be  derived. 

IV. 4  Optimal  Single-Period  Control 
A  control  procedure  will  be  referred  to  as  providing  an  optimal 
solution  to  a  control  problem  if  it  explicitly  recognizes  all  existing 
major  forms  of  uncertainty  and  utilizes  the  information  provided  by 
them  in  its  solution  to  the  control  problem.  Thus,  for  example,  for 
a  control  procedure  and  its  solution  to  be  called  optimal  when  the 
controller  knows  the  form  of  the  model  generating  y,  but  does  not 
know  the  parameter  values  of  the  model,  the  procedure  need  only  con- 
sider residual  and  parameter  uncertainty.  However,  should  specifica- 
tion uncertainty  concerning  the  model  be  present  as  well,  the  procedure 
would  have  to  consider  residual  uncertainty,  parameter  uncertainty, 
and  model  specification  uncertainty.  As  discussed  in  Section  IV. 3,  the 
certainty-equivalent  approach  to  economic  control  problems  treats 
residual  and  parameter  uncertainty  suboptimally  and,  unless  BMC 
certainty-equivalent  control  procedures  are  used,  also  treats  model 
specification  uncertainty  suboptimally.   In  this  section,  optimal 
control  solutions,  i.e.,  solutions  that  appropriately  treat  residual, 
parameter  and  model  specification  uncertainty,  will  be  derived  using 
the  BMC  procedure.  These  solutions  will  be  referred  to  as  "optimal 
BMC  control  solutions." 

Before  proceeding  with  the  derivation  of  optimal  BMC  control 
solutions,  mention  should  be  made  of  the  optimal  control  solution  for 
the  case  in  which  the  controller  knows  the  form  of  the  model  generating 
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y,  but  not  its  parameters.  Assuming  M,  is  the  true  model,  and  em- 
ploying a  quadratic  loss  function,  Zellner  shows  that  the  optimal 
solution  to  (4.2)  is 


XF  = 


(4.27) 


!n  -  1)S 


2     I  x-y. 


(n  -  3)  I   x.y.     I   x. 
i=l  n  ]    i=l  n 


Equation   (4.27)  may  be  rewritten  so  that  its   relationship  to  the 
certainty-equivalent  solution  to  this  problem  may  be  examined: 


F       b1 


1 


In  -  1)S 


+  1 


(n   -   3)b   "2  I  x2 
1     i=l  1 


(4.28) 


Recall    from  Section   IV. 3  that  the  certainty-equivalent  solution  is 

F       b1 

Thus,  as  Zellner  has  noted,   the  certainty-equivalent  solution  is  just 

2 


the  first  term  on  the  rhs  of  (4.28 


Zellner  has  shown  that  as  the 


precision  of  the  estimation  of  B-,  improves  (i.e.,  as  the  posterior 


variance  of  s,  decreases), 


Zellner,  pp.  320-322, 


Ibid. 
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[n  -  1)S2 


?   n  ? 
(n  -  3)b  "2  I   xf 

1  i=l  n 


and,  accordingly,  the  second  term  on  the  rhs  of  (4.28)  approaches  1. 
Thus,  if  b, "  is  a  very   precise  estimate  of  3,,  (4.29)  is  approximately 
(4.28).  Zellner  has  also  demonstrated  that  the  use  of  the  certainty- 
equivalent  solution  (4.29)  leads  to  higher  expected  losses  than  the 
use  of  (4.28).2 

IV. 4.1  Optimal  BMC  Control 

The  optimal  BMC  control  solution  is  obtained  by  minimizing  ex- 
pected loss  over  XF  and  Zp  using  the  BMMP  distribution  of  yp.  This 
problem  was  stated  in  (4.3)  and  is  repeated  here  for  convenience: 

min  J  L(yF,y  *)f(y  |y,D,D  )dyp.  (4.29) 

DF  —   r  r     r     r   r 

(4.29)   is   solved  below  for  the  two-model   case   (see  the  assumptions  of 
Section   IV. 2)   under  study  in  this  dissertation. 

Substituting   (4.10)   for  f(yF|y,D,DF)   in   (4.29),   the  minimization 
problem  becomes 
m 


in   [P"(M] |y,X)   /  L(yp,yF*)f (yF|M] ,y ,X,Xp)dyF 
F 

+  P"(M2|y,Z)   /   L(yF,yp*)f(yp|M2,y,Z,ZF)dyp].  (4.301 


1 1  b  i  d . 

2Zellner,   pp.    322-324. 
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Recalling  that  3,  and  e,  and  B2  and  6  are  assumed  to  be  independent, 

the  following  transformation  of  variables  can  be  made  in  the  first 

and  second  terms  of  (4.30),  respectively,  so  that  (4.30)  may  be  written 
in  a  more  convenient  form: 

yF  =  3-,  Xp  +  e 

yp  =  B2ZF  +  6. 

Thus,   utilizing  a  quadratic  loss   function  for  L(yp,yF*),    (4.30)  may 
be  written 

min    fl,"(M1|y.X)EBi>e>^yiX>XF[yF*-   <S,XF  ♦  e)f 

+  P"(M2!^Z)EB2,6^,,Z,ZF^F*-(62ZF+5»21  (4-31) 

It  can  be  seen  that,  as  in  the  case  of  the  BMC  certainty-equivalent 
control  problem,  (4.31)  separates  into  two  minimization  problems, 

min  E«      n2iv  v  v  [yF*  -  (3ixf  +  e)]2  (4'32) 

and 

min  E«  «  n2|v  Z  Z  ^F*  -  (62Z  +6)]2-  (4'33) 

Zp   e2,6,a5|y,Z,ZF  F      I 

Recall  that  (4.2)  is  the  mathematical  statement  of  the  control  problem 
when  it  is  known  that  M,  will  generate  yp.  After  the  transformation 
of  variables  noted  above,  (4.2)  and  (4.32)  are  the  same.  Thus,  the 
solution  to  (4.32)  will  be  the  same  as  that  derived  by  Zellner  for 
(4.2).  Except  that  it  is  M2  that  is  known  to  be  generating  yp  in 
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(4.33),   the  solution  to   (4.33)  will   also  be  of  the  same  form  as  Zellner's 
solution  noted  in   (4.27). 

Even  though   the  solutions  to   (4.32)   and   (4.33)  are  already  known, 
(4.32)  will    be  solved  below  for  later  use  as  a  reference  in  solving 
optimal   BMC  control    problems  when   instrument  use  costs  are  considered. 
Squaring  the  term  in  brackets   in   (4.32)  yields 

f"  E61>e.^|y.X.XFCV2  "  2VVf  "  2VE  +  4XF  +  2elV +  ^'   <4-3"» 
which  can  be  expressed  as 

rnin   [y,*2   -   2yp*SpE6i  ly>x^  )   -   2yF*Ee|y,X.XF<e>   + 

^ly.X.Xp^  +  2XFE3r£|y>XsyF(el^  +  Ee|y,X,XF^2^-      ^35> 

Recall    that  E{&})   =  b}\   E(e)  =  0,   E(e2)   -   [E(e)]2  =  E(e2),   V(6-,)   = 
E(B2)   -   [E(61)]2,   and  covfs^e)   =   E^e)   -   E(6]  )E(£)   =  0.      Thus, 
(4.35)  may  be  written   as 

min   [yp*2   -   2yp*Xpb1 "   +  X2{V(s])   +  b1"2}+  2Xp{cov(61  ,e)  + 
XF 

E(B1)E(e)}   +  o2].  (4.36) 

Further  simplification  reduces   (4.36)   to 

min    [yp*2   -   2yp*Xpb1 "   +  X2V(b])   +  XJlb.,"2   +  a2].  (4.37) 

XF 
(4.37)  can  be  solved  by  taking  the  partial   derivative  of  the  term  in 
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brackets  with  respect  to  Xp,  setting  the  resultant  derivative  equal 
to  zero,  and  solving  for  X...  Before  proceeding  to  solve  (4.37),  how- 
ever, recall  that 

n 

h  "  -  i  =  1 


.  i 


and  note  that  the  variance  of  6,' s  marginal  distribution  is 

.2 


V(B1)  - 


(n  -  1)S 


(n  -  3)  I   xf 
i=l  1 


',4.38) 


Thus,   the  partial   derivatives  of  b, "  and  V(g, )  with  respect  to  X.  are 
both  zero.     Calling  the  bracketed  term  in   (4.37)  A, 


3A_ 
3Xr 


■Zyp*^"  +  2V(e1)xF  +  2b1"2xF 


(4.39) 


Setting  (4.39)  equal  to  zero,  solving  for  Xp,  and  substituting  (4.38] 
for  V(e, )  yields 


b  "v  i 
Dl  yF 


F       (n   -   1)S2  2 

+  b," 


(4.40) 


2       ul 


(n   -   3)l*\ 
Recalling  the  definition  of  b,"  above,   XF  may  be  expressed  as 


In    , 


Raiffa  and  Schlaifer, pp.    344-345, 
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XF=  L-E 5 •  (4.4!) 


<"-!>*! AVl 


n  n     2 


(n  -  3)  I  x  y  I  x< 

i=l   n   n         i=l   1 


Similarly,   the  solution  to   (4.33)   is 


yF* 

ZF  =  E _ .  (4.42) 

(n-l)S^  ^Vi 


+ 


n  n 

(n  -  3)  I  x.y.  I  x. 

i=l    1    ]         i=l    1 


Comparing  (4.41)  and  (4.42)  with  (4.27),  it  can  be  seen  that  these 
solutions  are  in  fact  the  same  as  Zellner's.  Thus,  the  optimal  BMC 
control  solution  is  to  set  Xp  as  if  it  were  known  that  M,  had  generated 
y  and  would  be  generating  yF,  and  to  set  Zp  as  if  it  were  known  that 
M^  had  generated  y  and  would  be  generating  yp. 

As  with  the  BMC  certainty-equivalent  control  solution,  the 
rationale  behind  the  optimal  BMC  control  solution  is  that  since  the 
controller  is  unsure  of  which  of  the  two  control  instruments,  Xp  or 
Zp,  affects  yF,  he  should  use  them  both  fully  in  attempting  to  attain 
yp*.  Due  to  the  restrictive  assumptions  under  which  it  was  derived, 
this  solution  suffers  the  same  lack  of  realism  as  did  the  BMC  cer- 
tainty-equivalent solution.  A  more  realistic  solution  would  account 
for  the  possibility  that  (1)  costs  might  be  incurred  for  use  of  an 
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instrument,  especially  for  the  use  of  an  inappropriate  instrument; 
(2)  both  instruments  might  affect  yF;  (3)  the  instruments  interact  in 
some  manner;  and/or  (4)  the  process  generating  yF  may  be  nonstationary. 
Situation  (1)  is  treated  in  the  next  section,  situation  (2)  is  dis- 
cussed in  Section  IV. 4. 5,  and  an  approach  to  situation  (4)  is  presented 
in  Chapter  V. 

IV. 4. 2  Optimal  BMC  Control  When  Instrument  Use  Costs  Are  Considered 

The  optimal  BMC  control  solution  derived  in  Section  IV. 1  is 
appropriate  if  the  assumptions  of  Section  IV. 2  hold  and  the  use  of 
the  instruments,  XF  and  ZF,  involves  no  cost  to  the  controller. 
Optimal  BMC  control  solutions  will  now  be  derived  for  the  following 
cases  in  which  instrument  use  costs  are  involved: 

Case  1 :  The  instrument-use  cost  varies  with  the  level  of 
instrument  usage.  Let  Xp  and  Zp  cost  the  controller  C|  and  C2  per 
squared  unit  of  Xp  and  ZF  used,  respectively. 

Case  2:  A  fixed  as  well  as  a  variable  instrument-use  cost  is 
incurred  by  the  controller.  Let  the  controller  incur  a  fixed  instru- 
ment usage  cost  of  e,  for  use  of  XF  and  e2  for  use  of  Zp.  Let  each 
unit  of  XF  and  Zp  used  by  the  controller  cost  him  d-j  and  d,,,  respec- 
tively, e, ,  e?,  d,  and  d?  are  constants  known  to  the  controller. 

Case  3:  There  is  no  cost  incurred  by  the  controller  as  a  result 
of  his  using  the  appropriate  instrument  (the  instrument  that  affects 
yF),  but  there  is  a  cost  for  use  of  the  inappropriate  instrument. 
Thus,  if  Xp  is  used  needlessly,  let  a  cost  of  C|  per  squared  unit  of 
XF  used  be  incurred.  If  Zp  is  used  needlessly,  let  a  cost  of  C2  per 
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squared  unit  of  ZF  used  be  incurred. 

It  is  assumed  that  instrument  costs  are  expressed  in  the  same 
measurement  units  as  the  losses  described  by  the  controller's  loss 
function.  Accordingly,  each  of  the  above  cost  considerations  may  be 
included  in  the  solution  to  (4.29)  by  explicitly  introducing  each  into 
(4.31). 

For  Case  1,  if  XF°  and  ZF°  units  of  XF  and  ZF  are  used  by  the 
controller,  he  will,  with  probability  one,  incur  costs  of  C-,  per 
squared  unit  of  XF  used  and  C?  per  squared  unit  of  ZF  used,  no  matter 
which  model  is  in  fact  appropriate.  Accordingly,  with  the  inclusion 
of  Case  1  costs,  (4.31)  becomes 


min  (P"(M1|y,X)E     2, y>XjX  [yF*  "  (B^  +  e)] 

Ur  let" 


*  P"(M2ly'Z)  V,af  |y,Z,Z  FEV   "    <S2ZF  +  «»*  +  C1XF  +  C2ZF>-        <4'43> 

As  with   (4.31),   and  for  the  same  reason,    (4.43)   separates   into  two 
minimization  problems, 


T 
and 


min(P"(H,|y,X)E  [yp*  -    Ufy  ♦   E)]2  ♦  C,X2}  {4.44) 

a,-  I  e '  r 


min(P"(M2|y.Z}E82)5>CT||yjZiZF[yF*  -   {,&  +  6)]2  +  c/f}.  (4.45) 

(4.44)  and   (4.45)  may  be  solved  in  the  same  manner  as   (4.32)   and 
(4.33).     Accordingly,   prior  to  taking  the  partial   derivative  of  (4.44) 
with  respect  to  XF,    (4.44)  may  be  written  in  the  same  form  as    (4.37): 
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min  {P-a^ly.XJCyp*2  -  2yF*X[-b1"  +  XJIV^)    + 
XF 


Xpb^'2  +  V(e)]   +  C^X2} 


(4.46! 


Call   everything  inside  the  braces  of  (4.46)  A,  and  denote  P"(M, |y,X) 
by  P, .     Then,   the  partial   derivate  of  A  with  respect  to  XF  is 

||-=  -2yF*b1"p1    +  2XpV(B1)P1    +  2XFb1"2P1    +  2C1Xp.  (4.47) 

Setting   (4.47)  equal   to  zero  and  solving  for  XF,   the  optimal   setting 
for  XF  is  obtained   (second  order  conditions  are  easily  shown  to  hold): 


XF  = 


v  *b   "P 
yF  d1    r] 


P1[v(e1)  +  y2]  +  c1      !^i 


(4.48) 


bl"  Plbl" 


The  optimal    setting  for  ZF  may  be  obtained  by  solving   (4.45)   in  a 
similar  fashion: 


Zr-     = 


yt 


F  "  V(3?)  C« 

t—  +    h     "    +    — r- 

b2"         D2         P2b2" 


(4.49) 
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Thus,  the  optimal  single-period  BMC  control  solution  for  Case  1  is 


v(e-,)   c1   '  vTb^T 


bl"    Plbl 


2  *  b  "  +   2 


2    P2b2' 


(4.50) 


The  implications  of  the  solutions  to  Cases  1,  2,  and  3  will  be  dis- 
cussed together  after  all  three  solutions  have  been  derived. 

For  Case  2,  a  fixed  cost  of  e-,  is  incurred  by  the  controller 
if  he  uses  XF  (i.e.,  if  he  sets  Xp  at  any  value  other  than  zero),  and  a 
fixed  cost  of  e?  is  incurred  if  he  uses  Zp.  Also,  the  controller  will 
incur  costs  of  d-,  per  unit  of  Xp  used  and  d?  per  unit  of  ZF  used  no 
matter  which  model  is  appropriate.  Thus,  since  the  costs  of  Case  2, 
like  Case  1,  are  incurred  irrespective  of  which  model  is  appropriate, 
e,  +  d,XF  and  e?  +  d?ZF  should  be  added  to  the  expected  loss  expression 
in  (4.31).  Accordingly,  with  the  inclusion  of  Case  2  costs,  (4.31) 
becomes 
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mm  (P"(m1  y,x\vC,a2lyfX^lyF*  -   (3^  +  e)]2* 


P"(M2l^Z)E32,6,a2|y,Z,ZF^F*-  (^ZF  +  6)]2  + 

e]  +  e2  +  d,Xp  +  d2Zp}  .  (4.51) 

As  with  (4.31)  and  (4.43),  (4.51)  separates  into  two  minimization 
problems,  one  whose  solution  is  the  optimal  setting  for  Xp,  and  one 
whose  solution  is  the  optimal  setting  for  Zf.      Solving  the  two  problems 
in  the  same  manner  that  (4.32)  and  (4.44)  were  solved,  the  optimal 
setting  for  XF  is  found  to  be 


di 


2P,b, "       if  by  using  XF  the  controller's 


V(B-i)  total  expected  loss  is  reduced 

,  H  +  b,"       by  at  least  XF's  fixed  use-cost 


1 


XF 


otherwise,  (4.52) 


wh 


ere,  again,  P"(M, |y,X)  is  represented  by  P, 
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The  optimal  setting  for  Zp  turns  out  to  be 


h- 


d2 

:*  -  pp  h  "  1-f  by  using  Zp  the  controller's 

y/      \ expected  loss  is  reduced  by  at 

2     .   .    u  least  Zc's  fixed  use  cost 
b  "    D2  h 


0        otherwise,  (4.53) 

where  P"(MJy,Z)  is  represented  by  P„.  Thus,  the  optimal  single- 
period  BMC  control  solution  for  Case  2  is  Dp  =  (Xp,Zp)',  where  Xp 
and  Zp  are  as  given  in  (4.52)  and  (4.53),  respectively. 

For  Case  3,  if  Xp  is  utilized  by  the  controller  when  in  fact  it 
is  Zp  that  affects  yp,  a  cost  of  c,  per  squared  unit  of  Xp  used  is 
incurred  by  the  controller.  Similarly,  if  Zp  is  used  to  control  yF 
when  it  is  Xp  that  affects  yF,  a  cost  of  c?  per  squared  unit  of  Zp 
used  is  incurred.  A  cost  structure  of  this  nature  would  be  appropriate 
if,  say,  use  of  the  inappropriate  instrument  caused  some  undesireable 
side  effects.  For  example,  suppose  a  firm  were  interested  in  increasing 
the  productivity  of  its  assembly  line  workers  and  believed  that  an 
annual  Christmas  pay  bonus  was  an  appropriate  means  of  so  doing.   It 
may  turn  out,  however,  that  the  bonus  does  not  affect  the  assembly  line 
workers'  productivity,  but  does  hurt  the  morale  and,  therefore,  the 
productivity  of  the  firm's  employees  that  did  not  receive  a  pay  bonus. 
If  such  were  the  case,  the  pay  bonus  would  be  an  inappropriate 
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incentive  that  would  have  costly  side  effects. 

Since  in  Case  3,  the  controller  is  uncertain  as  to  which  instru- 
ment affects  yF,  he  is  uncertain  which  instrument  is  inappropriate, 
and,  therefore,  is  uncertain  of  the  cost  he  would  incur  by  utilizing 
Xp  and/or  Zp.  Thus,  unlike  Cases  1  and  2,  it  is  the  expected  cost  of 
Case  3  that  the  controller  should  consider.  Noting  that  the  probability 
of  incurring  a  cost  for  use  of,  say,  Xp  is  the  probability  that  M,  is 

inappropriate  for  use  in  control  of  yp,  i.e.,  P'!(M?  |y,z) ,  the  controller 

2       2 
can  consider  Case  3  costs  by  including  c,Xp  and  c?Zp  in  (4.31)  as 

follows: 

min  (P-^ly^^^^^Cyp*-  (,fy   +  e)]2  + 

P"^>*\,S,ot\y,Z,l^*-   (B2ZF  +  5)]2  + 

P"(M1|y,x)C2Z2  +  P"(M2|y,z)C1Xp}  .  (4.54) 

As  in  the  previous  cases  considered,  the  minimization  problem  of  (4.54) 
separates  into  two  minimization  problems,  one  of  which  is 

,2 


min  {P"(M1|y,x)E0   .  _2lw  v  v  [yc*  -  (enXc  +  e)] 

F 


,Tjr,AiCB1.e,a2|y,X,XFurF    VP1AF 


+  P"(M2|y,z)C1Xp}.  (4.55) 


The  other  is 
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P"(M1|y,x)C2Zp  .  (4-56) 


min  CP"(M2|y.z)E    f02,yiZ,Z  EV  -  i*fr   +  «)T 


The  solutions  to  (4.55)  and  (4.56)  may  be  obtained  via  the  same  pro- 
cedure used  to  solve  (4.32)  and  (4.44).  Accordingly,  the  optimal 
setting  for  Xp  is  found  to  be 


X- ^b: p-  H.57, 

r  ?   v'. 


V(B-,)  +  b-,"2  +/C1 


and  the  optimal  setting  for  Zp  is  found  to  be 


,        W  (4.58! 

F  P 

V(B2)  +  V2  +^C2 


In  both  (4.57)  and  (4.58),  P"(M-,|y,x)  =  P]  and  P"(M2|y,z)  =  ?r 
Thus,  the  optimal  BMC  control  solution  for  Case  3  is  Dp  =  (Xp,Zp)', 
where  Xp  and  Zp  are  as  given  in  (4.57)  and  (4.53),  respectively. 

The  following  statements  can  be  made  about  the  instrument  settings 
established  by  the  optimal  BMC  control  procedure: 

(1)  In  each  of  Cases  1,  2,  and  3,  as  well  as  the  cost-free 

case  of  Section  IV. 4.1,  as  the  precision  of  the  information 
known  about  an  instrument's  response  parameter,  g^ 
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increases,  the  instrument's  optimal  setting  increases;  i.e., 
as,  say,  V(e,)  decreases,  Xp  increases. 

(2)  In  each  of  Cases  1,  2,  and  3,  as  the  cost  per  unit  (or 
cost  per  squared  unit)  of  an  instrument  used  increases, 
the  instrument's  optimal  control  setting  decreases. 

(3)  In  each  of  Cases  1,  2,  and  3,  as  the  probability  of  a 
particular  model  being  true  increases,  the  optimal  setting 
of  the  model's  instrument  increases,  while  at  the  same  time 
the  optimal  setting  of  the  other  model's  instrument  de- 
creases.  In  Case  3,  this  is  a  result  of  the  expected  cost 
per  squared  unit  of,  say,  Xr  decreasing  as  P"(M, |y,x) 
increases. 

Assuredly,  Cases  1,  2,  and  3  are  not  the  only  instrument-use 
cost  situations  that  the  controller  may  face.  Consider  the  following: 

(1)  Zellner  has  examined  the  problem  of  how  to  account  for 

the  cost  of  changing  the  setting  of  an  instrument  in  optimal 
single-period  control  problems  in  which  a  particular  model 
is  assumed  to  generate  yF- 

(2)  The  forms  of  the  cost  structures  used  in  Cases  1,  2,  and  3 
are  certaintly  not  exhaustive. 

(3)  Some  combination  of  Cases  1,  2,  and  3  may  be  appropriate. 

(4)  Some  combination  of  Cases  1,  2,  and/or  3  which  also  involves 
a  cost  for  changing  the  level  or  rate  of  existing  instrument 
settings  may  be  appropriate. 


Zellner,  pp.  324-325, 
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As  an  example  of  (2),  suppose  the  cost  of  using  the  inappropriate 
variable  is  better  expressed  by  a  linear  function  than  by  the  quadratic 
function  used  in  Case  3.  Then  the  optimal  BMC  control  settings  for  Xp 
and  Zp  would  be 

P  d 

w  - 


and 


F  1    2P, 

XF  =  V  (4.59; 

F    V(3  )  +  b  "d 


Pd 

v  *h  " — - 

yF  D2    2P9 

ZF  =  K   .  (4.60) 

V(B2)  +  b  "^ 


Concerning  (3),  suppose  Cases  1  and  3  are  both  relevant.  Let  Case  1 
costs  be  as  described  above.  For  Case  3,  let  a  cost  of  g,  per  squared 
unit  of  Xp  used  be  incurred  if  Xp  is  the  inappropriate  instrument, 
or  g?  per  squared  unit  of  Zp  used  if  ZF  is  the  inappropriate  instrument. 
Then,  the  optimal  BMC  control  settings  for  Xp  and  Zp  would  be 


X  =  *— ! p =-  (4.61) 

0  7  1 


and 
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ZF  = 


v  *b   " 


P  C 

V(B2)   +  b2"2  +  plg2+^ 


(4.62) 


In  the  next  section,   the  certainty-equivalent  BMC  control    solutions 
for  Cases  1,   2,   and  3  of  this  section  are  presented. 

IV. 4. 3     Certainty-Equivalent  BMC  Control    Solutions  When   Instrument 
Use  Costs  Are  Considered 

The  certainty-equivalent  BMC  control   solutions  for  Cases  1,   2, 

and  3  of  the  previous  section  are  derived  from  the  minimization  problem 

of  (4.13): 

min   tP"(M1|y.X)[Ey    Mify>XiXF(yF)  -  yR*]    + 


P"(M 


2'y'Z)£EyF|M2,y,Z,ZF(V   "  V^   } 


(4.63) 


The  solutions  are  derived  in  the  same  manner  as  the  optimal  BMC  control 
solutions  for  Cases  1,  2,  and  3. 

The  following  are  the  certainty-equivalent  BMC  control  solutions 
for  Cases  1,2,  and  3: 
Case  1 : 

V 


XF  = 


14.64) 


'I 


V  +  vp, 


CASE  3: 


V  +  b^ 
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(4.65) 


CASE  2: 


^F  "  2P1 


(4.66) 


F  "  2P, 


(4.67; 


XF  = 


„   P2C1 


(4.68) 


ZF  = 


Pc 
b2  +  PX1 


2"2 


(4.69) 


The  notation  used  in  (4.64)  through  (4.69)  is  the  same  as  that  of  the 
previous  section.  Notice  that  the  only  difference  between  these 
solutions  and  those  of  the  previous  section  is  in  the  denominators. 
In  particular,  the  V(3,)  term  does  not  appear  in  the  denominators  of 
(4.64)  through  (4.69).  As  a  result,  the  certainty-equivalent  BMC 
control  settings  for  XF  and  ZF  are  at  least  as  high  in  all  cases  as 
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the  optimal  BMC  control  settings.   The  absence  of  V(g, )  in  the  denomi- 
nator, and  the  resulting  higher  settings  of  Xp  and  lf,   is  due  to  the 
failure  of  the  certainty-equivalent  approach  to  consider  the  uncer- 
tainty of  yp. 

By  not  considering  yF's  uncertainty,  the  certainty-equivalent 
approach  ignores  some  of  the  risk  involved  in  choosing  settings  for 
XF  and  Zp.  The  optimal  BMC  control  problem  reflects  this  risk;  i.e., 
it  recognizes  that  the  higher  are  the  settings  for  Xp  and  Zp,  the 
higher  will  be  yF's  predictive  variance,  and,  therefore,  the  greater 
is  the  probability  of  |yp  -  yF*|  being  very   large.  The  optimal  BMC 
control  procedure  sets  XF  and  Zp  in  such  a  manner  that  the  probability 
of  the  controller  incurring  a  large  loss  as  a  result  of  yF's  predictive 
variance  being  large  is  smaller  than  under  the  certainty-equivalent 
approach.  Accordingly,  if  the  controller's  loss  function  is  sym- 
metric  (as  with  the  quadratic  loss  function  used  in  this  chapter), 
optimal  BMC  control  settings  for  XF  and  Zp  will  be  lower  than  those  of 
certainty-equivalent  BMC  control. 


The  certainty-equivalent  BMC  control  settings  will  be  higher  in 
all  cases  than  the  optimal  BMC  control  settings  if  V(g,)  >  0,  i  =  1,  2, 
i.e.,  if  g,  and  g~  are  not  known. 


For  a  discussion  of  single-period  control  when  the  controller's 
loss  function  is  asymmetric,  see  Roger  N.  Waud,  "Asymmetric  Policy- 
Maker  Utility  Functions  and  Optimal  Policy  Under  Uncertainty," 
Econometrica,  44  (January,  1976),  53-66. 
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The  risk  taken  by  the  controller  in  setting  his  instruments  is 
considered  from  the  point  of  view  of  optimal  BMC  control  in  the  next 
section. 

IV. 4. 4  Risk  Specification  in  Optimal  BMC  Control 

As  discussed  extensively  in  Chapter  III  and  in  Sections  IV. 3. 2 
and  IV. 4,  if  residual,  parameter,  and  model  specification  uncertainty 
exist  for  a  controller  and/or  forecaster,  the  BMMP  is  the  appropriate 
distribution  to  use  to  characterize  the  process  generating  yp.  Thus, 
if  the  predictive  variance  of  yF  is  viewed  as  a  measure  of  the  risk  the 
controller  takes  in  setting  his  instruments  so  as  to  attain  yF*,  the 
variance  of  the  BMMP  distribution  is  the  appropriate  measure  of  his 
risk.  The  optimal  BMC  control  procedure  utilizes  a  BMMP  to  determine 
control  settings  for  Xp  and  Zp.  Accordingly,  by  using  optimal  BMC 
control,  the  controller  can  determine  settings  for  his  instruments 
which  appropriately  reflect  his  uncertainty  concerning  the  process 
generating  yp  and  the  risk  he  will  be  taking  in  attempting  to  attain 
yF*.  As  previously  noted,  this  is  why  it  is  said  that  the  BMC  control 
procedure,  introduced  in  Section  IV. 4,  yields  optimal  instrument 
settings  for  single-period  control  problems.  Control  procedures  that 
ignore  existing  model  specification  uncertainty  ignore  available 
information  concerning  the  risk  involved  with  controlling  .yF,  and,  by 
definition,  cannot  yield  optimal  instrument  settings. 


IV. 4. 5  BMC  Control  When  More  Complicated  Models  Are  Included  in  the 
Model  Space 

Two  reasons  why  the  model  space  considered  in  the  previous  sec- 
tions of  this  chapter  might  be  employed  by  a  controller  are  the 
fol lowing: 

(1)  The  controller  believes  that  the  target  variable  is  affected 
exclusively  by  one  or  the  other  of  the  two  instruments. 

(2)  It  is  more  convenient  and/or  computationally  efficient 
for  the  controller  to  specify  his  beliefs  about  the 
nature  of  the  data-generating  process  via  two  simple 
models. 

In  either  case,  both  the  certainty-equivalent  and  optimal  BMC  control 
solutions  for  such  model  spaces,  when  instrument  use  costs  are  not 
considered,  would  have  the  controller  set  each  instrument  as  if  it 
were  the  only  instrument  that  affected  the  target  variable.  Such 
solutions,  and  therefore  such  simplistic  model  spaces,  could  in 
reality  lead  to  significant  over-shooting  or  under-shooting  of  the 
established  target.  For  example,  suppose  the  target  variable  were  the 
rate  of  inflation  in  period  t  and  the  two  instruments  were  government 
expenditures  and  the  rate  of  growth  of  the  money  supply  in  period  t-1 . 
The  above-mentioned  control  solutions  would  set  government  expenditures 
as  if  the  growth  rate  of  the  money  supply  had  no  effect  on  the  inflation 
rate,  and  would  set  the  growth  rate  of  the  money  supply  as  if  govern- 
ment expenditures  had  no  effect  on  the  inflation  rate.  Should  both 
government  expenditures  and  the  growth  rate  of  the  money  supply  in 
period  t-1  be  positively  related  to  the  inflation  rate  in  period  t, 
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such  a  control  policy  could  result  in  overspending,  an  inappropriately 
high  money  supply  level,  and,  therefore,  a  rate  of  inflation  above  the 
targeted  level . 

If  the  controller  believes  that  both  instruments  might  affect  the 
target  variable  he  can  help  avoid  over-shooting  or  under-shooting  his 
target  by  expanding  the  model  space  to  reflect  his  belief.  The 
following  are  examples  of  two  possible  expansions: 


[I)   M-,:   y  =  f^X  +  E 
M2:   y  =  62Z  +  6 


M3:   y  =  83X  +  B4Z  +  Y 

(2)  M^  y  =  s.,X  +  e 
M2:  y  =  e2Z  +  6 
M3:   y  =  33X  +  64Z  +  65XZ  +  y. 

In  both  cases,  the  inclusion  of  model  three  forces  the  control  solu- 
tions yielded  by  the  BMC  procedures  to  reflect  the  possible  dependence 
of  yF  on  both  Xp  and  Zp,  as  well  as  the  possiblity  that  only  one  or 
the  other  of  Xp  and  Zp  affect  yF<  Model  three  of  example  one  describes 
the  dependence  of  yp  on  Xp  and  Zp  as  being  additive.  Model  three  of 
example  two  describes  yF  as  being  dependent  on  the  interaction  between 
Xp  and  Zp,  as  well  as  Xp  and  Zp  main  effects.  Thus,  depending  on  the 
magnitudes  of  the  S  coefficients,  the  XF  and  Zp  settings  yielded  by 
the  BMC  certainty-equivalent  and  optimal  control  procedures  may  be 
lower  or  higher  than  in  the  two-model  cases  perviously  considered. 
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Analytic  BMC  certainty-equivalent  and  optimal  control  solutions 
for  more  complex  model  spaces,  like  the  two  above,  can  be  found  by  the 
straightforward  application  of  the  methods  used  to  solve  the  two  model 
cases  earlier  in  this  chapter.  One  complicating  difference  arises, 
however,  when  the  model  set  reflects  the  target  variable  as  being 
possibly  dependent  on  more  than  one  instrument.  In  such  cases,  the 
solution  to  the  control  problem,  i.e.,  the  minimization  problem  of 
(4.3),  does  not  separate  into  a  series  of  independent  minimization 
problems,  one  for  each  instrument,  as  was  the  case  in  (4.13)  and  (4.31). 
Instead,  the  simultaneous  solution  of  two  or  more  equations  is  required 
to  obtain  control  settings  for  the  instruments.  For  example,  obtaining 
the  optimal  BMC  control  settings  for  Xp  and  Zp  utilizing  the  model  space 
of  example  one  above  requires  the  simultaneous  solution  of  two  equations 
in  two  unknowns.  This  presents  no  particular  problem  unless  the 
equations  to  be  solved  simultaneously  are  of  degree  higher  than  one. 
In  that  case,  the  determination  of  instrument  settings  may  involve  the 
cumbersome  task  of  finding  the  roots  of  high  degree  polynomials.  This 
is  precisely  what  happens  when  the  model  space  of  example  two  above 
is  utilized  to  find  optimal  BMC  control  settings  for  Xp  and  Zp. 

This  chapter  discussed  the  application  of  the  BMC  procedure  to 
single-period  economic  control  problems.  It  was  noted  that,  unlike 
the  traditional  control  solutions  which  are  artificially  conditioned 
on  the  assumption  that  a  particular  econometric  model  is  the  true  model 
of  the  data-generating  process,  BMC  control  solutions  appropriately 
reflect  the  controller's  model  specification  uncertainty.  Analytic 


Ill 

solutions  were  found  for  simple  single-period  control  problems  using 
both  the  BMC  certainty-equivalent  and  BMC  optimal  approaches  to 
control.  Solutions  were  obtained  assuming  control  was  cost  free,  as 
well  as  for  cases  in  which  instrument-use  cost  functions  were  assumed 
to  be  known.  A  discussion  of  several  important  questions  regarding 
the  relation  of  BMC  control  to  other  control  methods  is  deferred  until 
Chapter  VI. 

The  next  chapter  introduces  a  Bayesian  procedure  for  making 
inferences  about  certain  types  of  nonstationary  data-generating 
processes. 


CHAPTER  V 
BAYESIAN  MODEL  SWITCHING 

In  an  approach  to  modeling  the  uncertainty  involved  in 
decision  making  situations  referred  to  here  as  the  Bayesian 
Model  Switching  (BMSW)  approach,  it  is  assumed  that  the  random 
variable  upon  which  a  decision  hinges  is  generated  by  different 
statistical  models  in  different  time  periods  with  the  switch 
between  models  described  by  some  random  process.  In  this  chapter 
the  methodology  of  the  BMSW  approach  is  developed  and  applied  to 
the  case  where  it  is  known  that  the  decision  variable  is  generated 
by  two  different  "switching"  normal  distributions.  As  a  practical 
approach  to  handling  model  nonstationarity  the  BMSW  methodology 
is  shown  to  be  computationally  unwieldy  even  when  the  number  of 
time  periods  considered  is  very   small.  It  is  hoped,  however, 
that  the  BMSW  approach  yields  useful  new  insights  into  the 
problem  of  modeling nonstationary  processes. 

V.l  Bayesian  Model  Switching  Methodology 
The  BMSW  approach  was  suggested  by  anomalies  observed  in 
sequences  of  posterior  model  probabilities  of  sets  of  competing 
models.  For  example,  in  comparing  five  alternative  aggregate 
consumption  functions  via  the  Bayesian  Model  Comparison 
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procedure,  Wiginton1  generated  the  following  sequence  and  others 
2 


Year 

P^) 

P(M2) 

P(M3) 

P(M4) 

P(M5) 

1911 

.008 

.781 

.200 

.011 

• 

1912 

.005 

.796 

.193 

.006 

1913 

.004 

.794 

.199 

.003 

1914 

.001 

.600 

.395 

.003 

1915 

* 

.015 

.982 

.002 

1916 

* 

.010 

.990 

* 

where  *  signifies  posterior  probability  <  10  .  Notice  the 
dramatic  change  that  occurs  in  the  posterior  probabilities  of 
model  2  and  model  3  between  the  years  1914  and  1915.  With  only 
one  additional  bit  of  data  (i.e.,  1915's  aggregate  consumption), 
model  2*s  posterior  probability  drops  from  .6  to  .015  and  model 
3's  increases  from  .395  to  .982.  Given  that  during  any  one  par- 
ticular period  one  of  the  five  alternative  representations  of 
the  aggregate  consumption  process  is  appropriate,  the  significant 
changes  in  posterior  model  probabilities  suggest  that  the  data 
being  observed  may  have  been  generated  by  different  models  in 
different  time  periods.   In  the  above  example,  it  appears  that 
model  2  may  have  been  the  more  accurate  representation  of  the 
aggregate  consumption  process  before  1915  and  model  3  the  more 
accurate  during  and  after  1915.  Thus,  in  situations  where  the 


]John  C.  Wiginton,  "A  Bayesian  Approach  to  Discrimination 
among  Economic  Models,"  Decision  Sciences,  5  (April,  1974),  190. 

2Martin  S.  Geisel,  "Bayesian  Comparison  of  Simple  Macro- 
economic  Models,"  Journal  of  Money,  Credit,  and  Banking,  5_ 
(October,  1973),  759-762. 
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posterior  model  probabilities  tend  to  vacilate  rather  than  con- 
verge over  time,  it  may  be  better  to  assume  that  the  process 
generating  the  observations  of  interest  is  nonstationary  rather 
than  assuming  stationarity  as  is  done  in  the  applications  of  the 
Bayesian  Model  Comparison  and  Bayesian  Model  Selection  procedures 
considered  in  this  dissertation.  Certainly  environmental  change 
is  the  rule  rather  than  the  exception,  and  could  explain  movements 
in  posterior  model  probabilities  such  as  those  witnessed  above. 

Given  a  set  of  N  alternative  switching  models  the  BMSW 
approach  assumes  that  one  of  the  N  models  is  "in  control"  each 
period  (i.e.,  is  an  appropriate  representation  of  the  process 
generating  the  random  variable  of  interest  each  period)  and  that 
all  N  models  have  a  positive  probability  of  "taking  control"  in 
any  period.   It  is  not  known,  however,  which  model  is  in  control 
in  any  particular  period.  Utilizing  a  multinomial  process  to 
describe  the  switches  among  models  from  period  to  period,  the 
random  variable  of  interest  may  be  characterized  as  follows: 

f(Y|er  e2,  ...,  eN)  =  ^(Y^,  e.,)  + 

Tr2f(Y|M2J  e2)  +  ...  +  *Nf(Y|MN,  9N)  .       (5.1) 

Y  is  the  variable  of  interest,  M,  stands  for  "model  i",  ei  is 
the  unknown  parameter  of  model  i  (possibly  a  vector),  and  77.  is 
the  probability  that  model  i  is  in  control  during  any  particular 
period.  For  simplicity  it  will  be  assumed  that  the  tt.'s  are 
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N 

known  (  I   ir.  =  1).  The  function  f(Y|M.,  9^  is  the  probability 

1=1  1 

distribution  of  Y  as  characterized  by  model  i.  Thus  the  process 

generating  Y  is  modeled  as  a  convex  linear  combination  of  the  N 
alternative  switching  models. 

The  characterization  of  a  random  process  as  a  linear  combi- 
nation (or  mixture)  of  two  or  more  other  processes  can  arise  as 
a  result  of  "richness"  considerations  as  well  as  from  nonstation- 
arity.  Statistical  models  are,  after  all,  only  approximations 
to  reality,  and  a  mixture  of  models  may,  for  reasons  of  compu- 
tational efficacy,  intuitive  appeal,  and/or  predictability,  prove 
to  be  a  better  approximation  than  any  one  model  by  itself. 

Quandt  (and  later  Goldfeld  and  Quandt)1  uses  a  mixture  of 
processes  from  the  same  family  to  model  the  behavior  of  his 
"switching  regression  regimes."  He  was  concerned  with  discon- 
tinuous shifts  in  the  parameters  (specifically  the  regression 
coefficients)  of  a  single  linear  regression  model.  The  method- 
ology of  this  chapter,  however,  can  be  applied  equally  as  well 
to  shifts  in  the  parameters  of  the  controlling  model,  and/or  to 
switches  in  the  mathematical  form  of  the  controlling  model,  and/ 
or  to  switches  in  the  independent  variables  of  the  controlling 
model  over  time.  Swamy  and  Mehta  note  that  the  likelihood 


Richard  E.  Quandt,  'A  New  Approach  to  Estimating  Switching 
Regressions,"  Journal  of  the  American  Statistical  Association,  67_ 
(March,  1972),  306-310.  Stephen  M.  Goldfeld  and  Richard  E.  Quandt, 
"The  Estimation  of  Structural  Shifts  by  Switching  Regressions," 
Annals  of  Economic  and  Social  Measurement,  2  (December,  1973), 
475-485. 
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function  yielded  by  Quandt's  classical  approach  to  estimating 
the  parameters  of  his  switching  regimes  generates  likelihoods 
that  are  "very  high  along  unreasonable  paths  of  the  parameter 
space.  ..."   They  also  point  out  that  the  parameter  vector 
containing  the  parameters  of  Quandt's  regression  regimes  is  not 
identifiable  without  restrictions.  Drawing  on  Lindley's  work, 
Swamy  and  Mehta  suggest  that  the  problems  that  arise  in  Quandt's 

approach  can  be  avoided  if  proper  prior  distributions  for  the 

2 
parameters  in  his  switching  regimes  are  employed.   The  approach 

taken  in  this  chapter  is  essentially  that  suggested  by  Swamy 
and  Mehta,  the  difference  being  that  the  methodology  of  this 
chapter  explicitly  permits  the  model  of  the  changing  data- 
generating  process  to  reflect  changes  in  the  process's  mathe- 
matical form  and/or  controlling  variables  as  well  as  changes  in 
its  parameters. 

As  it  stands,  with  its  parameters  unknown,  (5.1)  is  not 
\/ery   useful  to  the  decision  maker.  What  is  needed  is  a  predictive 
distribution,  a  distribution  of  as-yet-to-be-seen  Y  values,  f(Y), 
which  is  not  conditioned  on  the  parameters.  However,  the  develop- 
ment of  such  a  distribution,  as  will  be  seen  below,  is  extremely 
difficult  even  in  the  special  cases  in  which  N  =  2  and  M,  and  M2 


P.  A.  V.  B.  Swamy  and  J.  S.  Mehta,  "Bayesian  and  Non- 
Bayesian  Analysis  of  Switching  Regressions  and  of  Random  Coeffi- 
cient Regression  Models,"  Journal  of  the  American  Statistical 
Association,  70  (September,  1975),  593. 

2Ibid. 
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are  both  normal  models  with  known  variances  and  unknown  means. 
A  proxy,  therefore,  is  needed  for  the  desired  predictive.  One 
proxy  is  the  distribution  arrived  at  by  substituting  estimates  of 
9  ,  e  6N  obtained  via  a  Bayesian  estimation  procedure  into 

f(Y|er  e2,  ...,  eN). 

In  order  to  be  able  to  obtain  estimates  for  the  unknown 
parameters  and  be  able  to  revise  them  as  new  observations  on  Y 
are  observed,  thus  revising  f(Y),  it  is  necessary  to  establish  a 
scheme  for  revising  the  decision-maker's  prior  distribution  on  9... 
It  is  assumed  that  initially  the  decision  maker  knows  the  set  of 
N  switching  models  and  knows  lr^  ,  i  =  1 ,  2,  . . . ,  N.  All  other 
relevant  information  possessed  by  the  decision  maker  is  summarized 
in  his  prior  distributions  of  the  parameters  of  the  switching 
models,  g'^),  i  =  1,  2,  ...,  N.  When  one  or  more  new  Y  values 
are  observed  (i.e.,  when  the  process  being  modeled  is  sampled), 
g'(e.)  is  revised  via  Bayes  Rule  as  follows: 

g'-Ce^Y)  =  g'(eiH(Y|ei)/J  g,(ei)i(Y|ei)de1.     (5.2) 

The  function  g'(e.)   is  the  decision-maker's  prior  distribution  of 
6.,   g"(e  -  j Y)   is   the  decision-maker's  posterior  distribution  of 
e.,  and  i(Y|e.)   reflects  the  likelihood  of  a  particular  value 
of  9-   having   "generated"   the  observed  Y.     The  likelihood  function 
«,(Y|e.)s   is  structurally  quite  unusual.     Like  f(Y),   it  is  a 
mixture,   but  instead  of  being  a  mixture  of  models,   it  is  a  mixture 
of  likelihood  functions: 
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*(Y|e.)  =  ir1f(Y|M1)  +  7r2f(Y|M2)  +  ...   +  Tr.fCYlM.,  e.) 


+  V(Y'MN)-  (5-3) 


■n . ,   as  before,  is  the  probability  that  model  i  is  in  control 
during  any  particular  period.  The  function  f(Y|M.),  given  Y,  is 
a  model  likelihood.   It  reflects  the  likelihood  of  observing  Y 
given  that  Y  was  generated  by  model  i  (i.e.,  given  that  model  i 
was  in  control).  The  function  f(Y|M.,  e.)  is  also  a  likelihood 
function,  but  it  reflects  the  likelihood  of  observing  Y  given 
that  Y  was  generated  by  model  i  and  that  model  i's  parameter 
is  9..  Thus,  the  probability  revision  of  (5.2)  is  carried  out 
without  assuming  that  a  particular  model  has  generated  the  ob- 
served data,  Y.  This  is  unlike  the  revision  processes  in  the 
BMC  and  BMS  approaches  where  in  updating  either  f'(e.)  or  P'(M-) 
it  is  always  assumed  that  the  observed  data  have  been  generated 
by  the  model  being  revised. 

Equation  (5.3)  is  developed  as  follows: 


fc(Y|e.)  =  tt1  -F(  Y  |M1  »  9^  +  7T2f(Y|M2,  9n.)  + 


Trif(Y|M1,  e.)  +  ...  +  TrNf(Y|MN,  9.)      (5.4! 


where 


f(Y|Mr  e.)  =  f(Y|Mi),  i  t   j,  and         (5.5) 


f(Y|Mi)  -  j   f(Y|f1i,  9i)g'(9i)d9i.         (5.5! 
Thus,  upon  substituting  (5.5)  and  (5.6)  into  (5.4),  (5.3)  is 
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obtained.  Equation  (5.3)  is  repeated  here: 

*(Y|e.)  =  irl1r(Y|Ml)  +  Tr2f(Y|M2)  +   ...   + 

ir1f(Y|M1,   9.)   +   ...   +  irNf(Y|MN).  (5.7) 

In  (5.2)  e.  is  revised  without  assuming  that  the  observed  data 
was  generated  by  model  i.  Thus,  the  likelihood  function  being 
used  in  the  revision  process  should  reflect  the  likelihood  of 
the  data  having  been  generated  by  each  of  the  N  switching  models. 
Accordingly,  a  likelihood  function  that  is  a  mixture  of  model 
likelihoods,  where  the  mixing  weights  are  the  probabilities  that 
particular  models  generated  Y,  is  appropriate. 

Now  continuing  with  the  development  of  the  revision  process, 
substitute  (5.7)  into  (5.2): 

g"(e.j|Y)  =  g,(ei)[^1f(Y|M1)  +  ...  +  ir1f(Y|M1,  9.)  +  ...  + 

TNf(Y|M J]  /  /  g,(ei)[>1f(Y|Mi)  +  ...  + 

0i 
irif(Y|Mr  9i)  +  ...  +  TrNf(Y|MN)]de.  (5.3) 

g' (ei )[        same  as  above  ]  ,^   q\ 

=  7r1f(Y|M1  )  +  ...  +  TT.miM.)  +  ...  +  VHTfMT)  * 

The  denominator  of  (5.9)  is  in  a  form  similar  to  the  predictive 
distribution  of  interest.  Y's  predictive  distribution  prior 
to  sampling  is: 
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f(Y)  =  ir1f(Y|M1)  +  ...  +  ir.f(Y|M.)  +  ...  +  ^(Y^).   (5.10) 

Unlike  the  model  in  equation  (5.1),  f(Y)  is  not  conditioned  on 
the  parameters  of  the  switching  models.  The  decision-maker's 
uncertainty  about  the  parameters  is  reflected  in  (5.10),  making 
(5.10)  generally  more  useful  to  the  decision  maker  than  (5.1). 
After  Y  is  observed  and  g'(e,),  i  =  1,  2,  ...,  N  is  revised, 
the  following  predictive  distribution  is  obtained: 


where 


f(YF|Y)   =   /  g"(e.)[1r1f(YF|Mr   Y)   +   ...   + 

0i 

Tri-F(YF|Mi ,    e.)   +    ...    +  irNf(YF|MN,    Y)] 
=   ir1f(YF|f1l.    Y)   +    ...    +  ir1f(YF|M1,    Y)   +    ...   + 

TTNf(YF|MN,    Y)    ,  (5.11) 

f(Yfr|M.)  Y)  =  /  g"(ei|Y)f(YF|Mi,  ei)dei 

0i 

and  YF  denotes  a   future  value  of  Y.      It  is  this  predictive  dis- 
tribution,  f(Yp|Y),   that  the  decision  maker  should  use  to  model 
the  process  generating  Y.      But,   it  is  also  this  distribution  that, 
as  mentioned  earlier,   and  will   be  demonstrated  below,   is  extremely 
difficult  to  derive. 

Before  applying  the  preceding  methodology  to  a  specific 
problem,   it  would  be  helpful    to  understand  an  expansion  of  (5.9) 
that  arises   in  the  application.      (5.9)  may  be  expanded  as  follows: 
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g"(e.|Y)  =  [ir1g'(e1)f(Y|M1,  e^/^f (Y|M1 )  +  ...  + 
wi-f(Y|M1)  +   ...   +  TNf(YlMN)]  + 
{g'(6i)[1r1f(Y|M1)   +    ...    +  *l_l-f(Y|M1_l)  + 

ui+1f(Y|Mi+1)   +   ...    +  V(Y'MN)]/ 
C^fCYjM^   +   ...   +  ir.f(Y|M.)   +   ...   + 

*Nf(Y|MN)]}.  (5.12) 

Call  the  first  term  after  the  equal  sign  on  the  right-hand  side 
of  (5.12)  A  and  the  second  term  on  the  right  B.  It  is  B  that  is 
of  interest. 

B  =  g'te^Ciri  +  •••  +  ^'+1  +  •••  +-n]         (5J3) 
where 

*3  =  *jf(Y,MJ)/Cirlf(Y,Ml)  +  *'*  +  irif(YlMi)  +  '*'  + 

tt^YIM.)  +  ...  +  irNf(Y|MN)].  (5-14) 

Since  ir.  equals  the  probability  that  model  j  is  in  control  during 
any  particular  period,  and  f(Y|Mj)  represents  the  likelihood  of 
Y  being  generated  by  model  j,  irlj  can  be  interpreted  as  the 
posterior  probability  (i.e.,  posterior  to  Y)  that  model  j  was  in 
control  during  the  period  in  which  Y  was  observed.  Accordingly, 
from  now  on  tt  .  =  *'.  and  will  be  referred  to  as  the  prior  proba- 
bility that  model  j  will  be  in  effect  during  any  particular 
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time  period. 

V.2  Special  Case:  Two  Normal  Models 
The  revision  methodology  developed  in  Section  V.l  is 
applied  below  to  the  problem  established  by  the  following 
assumptions: 

(1)  The  decision  maker  knows  that  the  data-generating 
process  in  which  he  is  interested  is  nonstationary. 

(2)  Further,  he  knows  that  the  nonstationarity  is  caused 
by  the  fact  that  in  different  time  periods  one  or  the  other  of 
two  different  processes  is  generating  the  data. 

(3)  He  cannot  identify  when  a  particular  process  will  be 
in  control,  but  he  does  know  it!,  i  =  1,  2,  the  fixed  probability 
that  process  i  will  be  in  control  during  any  particular  period. 

(4)  The  decision  maker  knows  that  both  processes  may  be 

best  represented  by  normal  models  (i.e.,  normal  distributions) 

2 
with  known  variance  a  ,  and  unknown  means,  p,  and  p„: 


2, 
T   '  N v '  I  ^1 


Mr-   fM(Y|yi,  o 


M2:  fN(Y|y2,  a2)-  (5.15) 


The  decision  maker  should  model  the  nonstationary  process  as 
follows: 


Since  Y  is  generated  each  period  by  either  MT  or  M2  the 
data-generating  process  is  nonstationary  relative  to  Mj  and  M2. 
Relative  to  (5.16),  however,  the  process  is  stationary  since  the 
data  are  assumed  to  be  generated  by  (5.16)  in  each  period. 
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2  2 

f(Y|TTJ,   tt^,   \sy   y2,   a    )   =  Tv]fN(Y|ur   a    )  + 

4VY|u2,   a2).  (5.16) 

For  example,  suppose  the  decision  maker  is  a  security  analyst 
who  views  the  returns  on  shares  as  depending  on  some  event.  Con- 
ditioned on  the  event  occurring,  he  might  assess  the  distribution 
of  returns  to  be  fN(Y|pr  a2).  However,  if  he  does  not  know  a 
priori  whether  the  event  will  occur,  he  may  assess  the  marginal 
distribution  of  returns  to  be  a  bimodal  distribution  similar  to 

(5.16).1 

As  data  are  observed  (5.2)  is  used  to  obtain  the  posterior 

distributions  of  p. : 

g»(Vi|Y)  =  g,(vi)A(Y|u1)/7  g,(ui)A(Y|v1)dyi,       (5.17) 

i   =  1,   2.     The  functions  g"(p.|Y),   g'^),  and  SL(Y|y..)  were  all 
defined  above.     For  convenience,  the  revision  process  will   be 
demonstrated  via  revision  of  y-, .     Utilizing   (5.9),    (5.17)   becomes 

g-^lY)   ■  g,(y1)[ir1f(Y|M1J  Ul)  +  ir2f(Y|M2)]/ 

[ir1f(Y|M1)  +  Tr2f(Y|M2)].  (5-18) 

Before  proceeding  with  the  specifics  of  the  revision  process,  it 

is  necessary  to  know  the  form  of  the  decision-maker's  prior 

Vor  further  details  on  the  security  analyst's  problem,  see 
James  A.  Bartos,  "The  Assessment  of  Probability  Distributions 
for  Future  Security  Prices,"  (D.B.A.  dissertation,  Indiana 
University,  1969). 
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distributions  on  y,  and  p2<   It  will  be  assumed  that 

2 
g'di,)  =  N(mj,  2_) 

2 
g'(u2)  =  N(mJ,  ^r),  (5.19) 

where  n!  >  0,  i  =  1,  2.  Continuing  with  the  development  of 
(5.18),  f(Y|Mr  M])  -  fN(Y|pr  a2)  from  (5.15).  Or,  in  terms 

of  the  sufficient  statistic 

n 

1   Yj 

m  =  — - —  , 


as  derived  by  factorization  in  Raiffa  and  Schlaifer, 

2 
f(Y|Mr   V})   =   R1(Y)fN(m|M1,   M] ,  2_) ,  (5.20) 

where  n  is  the  sample  size  and  R-,(Y)  is  a  residual  which  is  a 

function  of  Y  alone.  Similarly, 

2 
f(Y|M2,  m2)  =  R2(Y)fN(m|M2,  u2,  2_).         (5.21) 

2 
Then,   using  another  Raiffa  and  Schlaifer  result: 

oo 

f(Y|M2)  =  /  f(Y|M2,  u2)g'(y2)du2  (522) 

2 
=  /  R2(Y)fN(m|M2,  y2,  ^-)g'(y2)dy2 

=   R2(Y)fN(m|m2,   -j^^—o2).  (5.23) 


]Raiffa  and  Schlaifer,   p.   294. 
2Ibid.,   p.   296. 
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All  of  the  terms  in  (5.18)  have  now  been  identified.  However, 
before  actually  substituting  these  results  into  (5.18),  it  will 
be  shown  that  R, (Y)  =  R~(Y),  thus  permitting  all  further  analysis 
of  g"(e.|Y)  to  be  done  in  terms  of  the  sufficient  statistic  m 
instead  of  Y.  From  (5.20)  and  (5.21), 


f(Y|yr  Mj)  =  R1(Y)fN(m|y1,  2p  M ) 


and 


f(Y|u2,  M2)  =  R2(Y)fN(m|u2,  ^-,  M2) 
More  specifically, 


f(Y|wr  ^V   =   (77-)"  exp[-  J2l(Yi   -  m,)2]  (5.24) 

v2-na  2a 

=   (4-)"  exp[-  -r  I(Y.   -  m)2] 
/2ira  2a  J 


exp[ 

-<- 

m     /2tto 


2       f  \2n 

-2-(m  -  Pl)  ] 

<:a 


exp[-  -L  (Y     -  m)2]^ 
2a^       J  /2ira 


exp[-  -^2    (m  -  Ml)    ] 
2a 


1 


(5.25) 


Rl(Y)fN(mh'   T'   Ml 


as   in   (5.20).     Notice  that  R^Y)   is  only  a   function  of  the 
observed  data.     Thus,   if  f (Y|u2»  f/|2)  were  Droken  down   in  an 
analogous   fashion,   utilizing  the  same  data,   R?(Y)  would  obviously 
equal    R^Y).     To  see  this,   substitute  p„  for  v.    in   (5.25). 


126 


Returning  with  the  above  results  to  the  revision  process,  (5.19), 
(5.20)  and  (5.23)  are  substituted  into  (5.18)  yielding 

t  2  2 

g"(Ml|Y)  =  <Jg'(u-,|m.;,  ^r)[7r1{R1(Y)fN(m|y1,  ^_  11-,)  + 

nA  +  n 


,2R2(Y)fN(n|m',-^o2)}]}/ 
{,1R1(Y)fN(m|mi',^o2)  + 

TT2Rl(Y)fN(m|m2'  ~h^~°2)}  ■  (5'26) 


2 
The  residuals  cancel  leaving  the  following: 

2  2 

g"(MllY)  =  hi  +  n  2  nA.  +  n  2 

irlfN^mlmr  -TTjTT-0  ]  +  -2f(mlm2'  -HJn—a  } 


^  nA  +  n  2 

g^lmj.  jh-H^WmJ.  -jjtjj—  a  ) 

+ ,]   ■  i £ •        (5.27) 

same  denominator  as  above 


Recalling  (5.14),  (5.27)  reduces  further  to 

2 
g"(u,|Y)  =  (same  first  term)  +  gN(y-j|m.j,  ^t)^'  ' 

Now,  calling  the  denominator  of  (5.27)  D  and  multiplying  the 
numerator  and  denominator  of  the  first  term  on  the  rhs  of  (5.27) 
by  2         1 
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the  following  is  obtained: 


g-^lY)  -i 


2  2 

gN(ui|mi'  nJ)fN(M|yr  H-'  Mi)Tri 


/W"]'  n^)fN(mh'  T)dlJl 


2         2 


gN(pllml'  ^)tt2 

1  a2  nl  +  n  2 

D  gN  (ul'ml.l'  HfV-F>-ifN(mlmi'  ^|F-°  )  + 


9^\m\3   2r)»- 


(5.23) 


Substituting  for  0  in  (5.28)  and  utilizing  (5.14),  (5.28)  reduces 
to  the  following: 

2  n'  +  n 


g"(M|Y)  = 


gN(yllml.T  ?^T)7Tlf(mlml'  -^-°2) 


n'  +  n  ?  n'  +  n  ~ 

ir,fM(m|m',  — j— — a    )  +  ir0fM(m|mI ,  —^rz: — a  x 


1  Nv  '     n^n 


n^n 


g^lmj,  ^r)^ 


2  2 

=  9"hlml.l'  FlVn-)7Tl  +  gN(ullml'  nT)7T2 


(5.29) 


Given  that  the  observed  data  were  generated  by  model  i,  m'.'  .  re- 
presents  the  posterior  mean  of  p..  Equation  (5.29)  can  be  readily 
interpreted.   It  says  that  the  posterior  distribution  of  u,  is 
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a  mixture  of  the  posterior  distribution  of  p,  that  would  have 
resulted  had  it  been  known  that  the  model  1  generated  the  data 
and  the  distribution  of  p-,  that  would  apply  if  it  were  known  the 
data  were  generated  by  model  2  (i.e.,  the  prior  distribution  of 
p.,).   The  mixing  weights  tt^'  and  up,  are,  respectively,  the 
probability  that  model  1  was  in  control  during  the  period  of 
observation,  and  the  probability  that  model  2  was  in  control 
during  the  period  of  observation. 

The  form  of  the  posterior  mean  of  p,  also  has  intuitive 
appeal  and  can  be  derived  using  (5.29)  as  follows: 

E"(V])   =   J   M1g"(y1|Y)dp1  (5.30) 

7.  -  2 

02  -  2 

=   Trjy^g^lm^,  FrrVn-)^l    +  *2  _{,  yl  9N(  ul  |ml '   nj)dul 

=  Tr"m''   ,   +  T^m-j    .  (5.31) 

The  posterior  mean  of  u,  is  thus  a  mixture  of  the  posterior  mean 
of  v,    that  would  have  been  obtained  had  it  been  known  that  Y  was 
generated  by  model  1,  and  the  posterior  mean  of  p.,  (really  the 
prior  mean)  that  would  apply  if  it  were  known  that  Y  was  generated 
by  model  2.  The  mixing  weights  are  the  same  as  in  (5.29). 


If  the  decision  maker  wished  to  revise  his  prior  distri- 
bution on  the  parameters  of  M, ,  and  knew  that  Y  had  been  generated 

by  M?,  he  obviously  would  not  use  Y  to  revise  M, 's  parameters. 
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Revision  of  y-'s  distribution  leads  to  results  analogous 

to  (5.29)  and  (5.31): 

2  a2 

g"(y2|Y)  -   ^g^(y2|m',  fy   +  ^gliU2lm2.2'  fipn^     (5" 


and 


E"(m2)  =  TT^'m2  +  ^2m2.2* 


(5.33) 


Throughout  all  of  the  above  analysis  Y  could  be  either  a 
scalar  or  a  vector.  If  it  were  a  vector,  then  it  would  seem  that 
it  is  possible  for  it  to  contain  observations  generated  by  both 
model  1  and  model  2,  in  which  case  the  above  analysis  would  not 
make  sense.  But,  it  was  assumed  that  the  decision  maker  is  able 
to  specify  the  time  periods  (called  here  control  periods)  during 
which  one  or  the  other  of  the  models  is  in  control.  During  these 
control  periods  the  process  of  interest  may  generate  more  than 
one  bit  of  data.  Thus,  Y  can  be  a  scalar  or  a  vector,  but  will 
always  reflect  only  the  data  from  one  of  these  periods  at  a  time, 
which  is  to  say  that  Y  will  never  reflect  data  generated  by  more 
than  one  model.  This  assumption  is  not  restrictive.  For  example, 
if  the  decision  maker  cannot  say  with  certainty  that  the  next  two 
observations  will  be  generated  by  the  same  model,  then  he  must 
reduce  his  control  period  to  that  interval  of  time  during  which 
just  one  Y  value  is  generated. 

The  revision  procedure  developed  above  utilizes  the  fact 
that  the  next  Y  observed,  be  it  scalar  or  vector,  contains  all 
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the  data  generated  in  the  next  control  period.  Thus,  this  re- 
vision procedure  is  an  end-of-period  revision  scheme.  According- 
ly, revision  results  (5.29),  (5.31),  (5.32),  and  (5.33)  apply  only 
at  the  end  of  the  first  control  period  for  which  data  were  avail- 
able. 

The  next  step  in  developing  the  BMSW  revision  process  for 
this  normal -normal  case  is  to  revise  (5.29)  and  (5.30)  using  the 
data  observed  during  the  second  control  period.  Thus,  (5.18)  must 
be  revised.  Equation  (5.18)  is  repeated  here  with  Y.  representing 
the  data  from  the  j   time  period: 

g'^lYg)  =  g'(u1)[^f(Y2|M1,  V])   +  7r£f(Y2|M2)]/ 

[ir^-F(Y2|M1)  +  7r2f(Y2|M2)]  .  (5.34) 

Functions  g'(u-).  i  =  1,  2,  are  the  posteriors  of  period  one, 
(5.28)  and  (5.32),  here  referred  to  as  being  "prior"  to  the  data 
observed  in  period  two.  -n '. ,  i  =  1,  2,  is,  as  before,  the  proba- 
bility that  model  i  is  in  control  during  any  particular  period, 
in  this  case  period  two.  f(Y?|M, ,  p, )  has  exactly  the  same  form 
as  (5.25),  leaving  only  f(Y„|M.)  to  be  derived.  Unlike  the  other 
terms  in  (5.34),  f(Y.|M.),  being  conditioned  on  all  available 
information  about  u. ,  changes  from  period  to  period  as  new  infor- 
mation is  incorporated  into  the  probability  distributions  of 
model  i 's  parameters.  This  can  be  seen  by  examining  the  way  in 
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which  f(Y.|M.)  is  defined: 

J1  v 

f(Yj|M.)  =  /  f  (Yj  |M1 ,  ui)g"(ui|Yj_1)dpi.       (5.35) 

The  function  f(Y2|M2)  is  derived  here.  By  (5.25),  f(Y2|M2>  p2)  = 

2 
R2(Y2)f..(m|pp,  — ,  M?).  The  function  g"(e?|Y,)  is  the  mixture 

of  normals  displayed  in  (5.32).  Thus, 

2  2 

f(Y2|M2)  =  /  R2(Y2)fN(m|y2,  2_,  M2)|>^(M2|m2,  ^-)  + 


7T2fJ(y2|m22,  f^lld^ 


;5.36) 


R2(Y2)[u^'  /  fN(m|y2>  ^-,  M2)f^(M2|m2,  ^r)dp2  + 


9  o 

rS?/NHv2.T-M2)fN^lm2.2«Hf^>d^ 

■  R9(Y9),:'fw(m|m'  ^-t^  a2)  + 


12|,2'"1 

uof.,(m|m 


'2'  nnJ 


2n  +  n 


2.2'  n(n2  +  n) 


2   2, 

a  )  . 


(5.37; 


Similarly, 


2n  +  n 


2   2. 


f(Y1|M1)  =  R1(Y2)^fN(m|m1>1,  ^yc^ )  * 


n  +  n 


C!f.,(m|m'  , 
2  \V        1   nn. 


L„2) 


Even  though  g"(y-j  |  Yj_-j )  is  written  as  being  conditioned 
on  only  Yj_i,  it  is,  of  course,  conditioned  on  all  the  observed 
data  of  periods  one  through  j-1. 


132 


Before  (5.37)  is  substituted  into  (5.34),  it  will  be  helpful  to 

introduce  additional  notation. 

1 

1)  Let  the  prior  distribution  of  u . ,  fMuJm!,  — r) ,  be 

denoted  by  f  . 

2)  Let  the  posterior  distribution  of  u.  after  one  period, 

where  the  revision  was  performed  assuming  Y,  was  gener- 

2 

ated  by  model  i,  f^y^m.  ^  n,CT+  R),  be  denoted  by  f "  . 

3)  Let  the  likelihood  function  of  the  sufficient  statistic 

m,  when  it  is  assumed  that  Y.  is  generated  by  model  i 

a2  I 

with  mean  p.,  f^m^,  — ,  M^),  be  denoted  by  f^_. 

4)  Let  the  model  likelihood  for  model  i  in  the  first 

control  period  in  cerms  of  the  sufficient  statistic  M, 

n  +  n'.  2 
fN(m|m!,  —,    1  a  ),  be  denoted  by  fM_. 

2n  +  n'.  ? 

5)  Let  fN(m|m.  •,  r-i  ;  \   a   ),  the  model  likelihood  for 

model  i  in  the  second  control  period  conditioned  on 

model  i  having  generated  Y,  (in  terms  of  the  sufficient 

2 
statistic  M),  be  denoted  by  f^  . 

i 

6)  Let  the  posterior  distribution  of  u-  after  period  two 

1,1   1,2     2,2 
be  denoted  by  f'1  ,  f"  ,  or  f"  as  it  is  assumed  that 

Mj     Mi   -    Mi 

Y,  was  generated  by  model  1  and  Y2  by  model  1 ,  or  Y-j  by 
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model  1  and  Y„  model  2,  or  y,  by  model  2  and  Y„  by 
model  1,  or  Y,  by  model  2  and  Y~  by  model  2,  respective- 
ly. 

Now,  substituting  (5.37)  into  (5.34),  which  is  repeated  here,  the 

following  is  obtained: 


^-j-fr(Y2|M1  )   +  ^f(Y2|M2) 


^,f;.  *  ^i^-i^  +  *w\ +  ^^ 


J[numerator]di 


(5.39) 


where  the  residuals,  MY,,)  and  R2(Y„),  have  been  excluded  since 
they  cancel.  Expanding  (5.39), 


g"(y-,|Y2)  ■  Cirjvjf^^  +  ^'^l^/Mg  +  V^/r^   + 


llwl,1IJM 

i~T7„7T, 


f..      +   -nll-n^-n'lf    f.,    1/ 
U-,    Mg  Z    2    Z    p,    M« 


J[numerator]dy, 
Ml 


I       n      i  .pn     .r"  ,      r     ii      i      ur-  .         ii     ii  r<-     Txn 

<   TT  i  IT -i  T         T,,         +     I   TT,  7T0TT,  T„         +     TT-,TT0T.,       It 

(    1    1    U-,    M,  1    2   1    M,  12  M?J   u- 


"2*1  ^/S,  +  Lwi'^  +  W^rl,^} ' 


!5.40) 


{ Vi^/rV"1  +  ["i,7T2"ifM?  +  V^W 
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'Z*Ufi11H1*'l  +  ["2"2"i'fH2  +  "5"2"2fM2]  I  ■      <5-4" 


t  1  r  £ 

Let  I  =  J  f  fu  dp,  and  Ip  =  J  f"  f,,  dp,.  Noticing  that 


Ty,V   1,1    p/M,        2,1        1,2  2,2 

— \ — L  =  f"  ,  —] — L  =  f"  =  f"  ,  f"  =  f"  ,  and  f '  =  f"  , 
^2      ^]    ^     M-]    y-]   y-|    y-|      p-i    Pi 


multiply  and  divide  the  first  and  third  terms  in  the  numerator  of 
(5.41)  by  I?  and  I,,  respectively.  Then, 

1,1  o  1,2 


[I^irj]   f^    +   C-2Vlff12  +  "2"2u2fM2]   fM'/    ' 


Ky^l  +  [^^2irifM  ]  +  ^r^-P * 


[ir2VlfM2  +  "SV^M^  I  *  (5-42) 


Using  a  by  now  familiar  Raiffa  and  Schlaifer  result, 


n  +  nl  2 
!l  =  /^/m/^I  =  fN(m'mi'  -TiTTj— °  >  =  fM] 


and 

2n  +  nj   2    2 

!2  =  ^/M^l  =  fN(n,'ml.T  n(nj  +  n)  °  }  =  \ " 

Substituting  these  results  in  (5.42)  and  factoring  iri1  and  tt2' 
out  of  the  second  and  fourth  terms  in  the  denominator  yields: 
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1,1 


1,2 


g-^lYg)    =    CCirJir^]    ^    +   ['I'Z'I^   +   "l^'l1    ^    + 


2,1 


^V  ^   +  [TT2^1fM2  + 


2,2 


9      '-»'■  2    -i 

Tr"irV'f      If")/    {[irVirifM   J   + 
w2   2n2  M2J     u-|  1    1    Ml 

2 


.*2 


^2^fM  ]  +  "j^i'lfy,  +  ^W1 


!5.43) 


Noticing  that  ji{  +  vj  =  1 ,  the  denominator  of  (5.43)  simplifies 

1,1  1.2 

g"(p-,|Y2)  =  cdr^]  f^  +  ^v2fn2]   ^  + 


2,2 


2,2 


4^'f  m2]  ^  ^z'iW' 


{[ir^fg  1   +  Cir^JfM  1   +  l>2*2fM?]  + 


[TT2uifM-,]} 


(5.44) 


1,1 


1,2 


1,2 


1,1        Mi  1     1,£       U-i  I     ^       ^] 
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(5.45) 


2,1  2,2  2,2 

ir,  t   t  +  tt-tt..    „  r       +  tt0tt0  „  r 
2,1      u-,  2   1  ,2      u-|  2   2,2      u, 

1,1  1,2  2,1 

r"    ,    f"      +  ttVOV    ,    +   tt"       ]    f"      +   it"    .       f"      + 

1,1      u-j  1      1  ,2  2,2J     p1  2,1        p^ 


-5C-T.2 +  '2,2]\",  (5-46) 


where,   for  example, 


•  .p2       /   rr_"_i*'   i 


i,i  -^\%  /{^ifry  +  ^2V 

[irjirjfj  ]  +  0^fM  ]}  .  (5.47; 


Looking  at  the  numerator  of  (5.47),  tt"  is  the  posterior  probability 

that  model  1  was  in  control  during  period  one;  it'  is  the  prior 

2 
probability  that  model  1  is  in  control  during  period  two;  fM  is 

"2 
is  proportional  to  the  likelihood  of  observing  Y2  given  that 

model  1  was  in  control  during  period  one  and  that  model  1  is  in 

control  during  period  two.  Thus,  tt''  ,  may  be  interpreted  as  the 

posterior  probability  that  model  1  was  in  control  during  period 

one  and  that  model  1  was  in  control  during  period  two.  tt'1    ■   is 

the  posterior  form  of  it V tt  . . 

To  summarize,  after  the  second  control  period  we  have  the 

following  results: 

g"(Ul|Y2)  -ir^V   +^^,2  +  ^,2]^  + 
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"2.1  %    +  ^1,2  +  7T2,2^  ^  (5-48) 


2,2  2,1       1,2 

g"(y,|Y„)  =  tt"   f"  +  ir"|>"  .  +  tt''   ]  f"  +  n"   „  f"   + 
VH2'  2'    2,2  p9    2"-  2,1     1,1    u9    1,1  Up 


"i'C'2,1  +  "i',i^!2  •  (5-49) 


Since  f"  and  f"  ,  i  =  1,  2,  were  obtained  by  the  application 

ul       u2 

of  Bayes'  rule  to  normal  prior  distributions  and  likelihood 

functions  whose  form  is  normal  (see  equations  (5.41)  and  (5.42)), 

i,j     i.j 

f"  and  f "  ,  i  =  1 ,  2,  are  normal  distributions.  Thus,  (5.48) 

yl      u2 

and  (5.49)  are  mixtures  of  normal  distributions. 

The  form  of  the  posterior  distributions  after  period  1  (see 
equation  (5.29))  and  after  period  2  (see  equation  (5.48))  indi- 
cate that  g" ( u . I Y  )  will  be  a  mixture  of  2  normal  distributions. 
This  is  a  consequence  of  the  special  nature  of  the  likelihood 
function  used  in  the  BMSW  revision  process.  Because  of  the  like- 
lihood function's  dependence  on  the  revision  results  of  previous 
periods,  it  expands  as  n  gets  large.   It  can  be  seen  by  examining 
equations  (5.25),  (5.34)  and  (5.37)  that  when  used  for  revising 
g" ( u .  I Y  ,),  the  form  of  the  likelihood  function  is  that  of  a 
mixture  of  2  +1  normal  distributions.  With  the  likelihood 
function,  and  therefore  the  posterior  distribution  of  p.,  ex- 
panding as  n  increases,  the  revision  procedure  described  above 
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becomes  computationally  unwieldy.  Even  when  n  =  2  the  procedure 
is  quite  messy.  Swamy  and  Mehta  ran  into  a  similar  problem 
in  their  work.  They  noted  that  ".  .  .any  reasonable  prior  dis- 
tribution on  e  that  we  may  have  leads  to  integrals  which  cannot 
all  be  expressed  in  closed  form  and,  as  a  result,  the  Bayesian 
argument  is  numerically  most  complex  to  execute." 

While  BMSW  is  certainly  a  feasible  procedure  for  handling 
many  model  nonstationarity  problems,  its  computational  complexities 
make  it  extremely  costly  to  implement.  Thus,  unless  n  is  quite 
small,  or  the  rewards  from  being  able  to  utilize  both  subjective 
and  sample  information  in  analyzing  "switching"  data-generating 
processes  are  quite  large,  the  BMSW  approach  is  probably  too 
costly  to  be  of  much  practical  value.  BMSW  does,  however,  serve 
to  demonstrate  the  difficulties  that  can  arise  in  modeling  non- 
stationary  processes  via  mixtures  of  posterior  distributions. 


Swamy  and  Mehta,  pp.  594-595. 


CHAPTER  VI 
CONCLUDING  COMMENTS  AMD  SUGGESTIONS  FOR  FURTHER  RESEARCH 

This  dissertation  has  dealt  with  the  treatment  of  uncertainty 
concerning  the  specification  of  data-generating  processes  in  statisti- 
cal decision  problems.  The  point  of  view  taken  has  been  that  a  deci- 
sion maker's  model  specification  uncertainty  is  in  itself  information 
about  the  data-generating  process  and  should  be  appropriately  re- 
flected in  the  decision-maker's  decision  analysis.  Thus,  in  modelling 
data-generating  processes,  the  emphasis  should  not  be  on  methods,  such 
as  model  selection  procedures,  which  discard  relevant  information  and 
mask  model  specification  uncertainty,  but  rather  on  methods,  such  as 
the  Bayesian  Model  Comparison  procedure,  which  formally  collect  and 
display  information  about  the  data-generating  process.  The  BMC  pro- 
cedure was  shown  to  be  an  appropriate  and  natural  way  to  formally 
extend  the  parametric  analysis  of  statistical  decision  problems  to 
include  consideration  of  the  possibly  widely  differing  predictive  and 
decision-making  implications  of  alternative  specifications  of  the 
data-generating  process.  Accordingly,  this  dissertation  has  advocated 
the  use  of  the  BMC  procedure  in  decision-making  problems  in  which 
model  specification  uncertainty  is  present. 

In  Chapter  III,  forecasting  with  and  without  formal  regard  for 
model  specification  uncertainty  was  examined  via  comparisons  of  BMC 
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forecasting  with  BMS  forecasting  and,  indirectly,  forecasts  obtained 

2 
using  the  max-R  model  selection  procedure.  It  was  shown  that  as  a 

result  of  not  appropriately  treating  model  specification  uncertainty, 

BMS  point  forecasts  may  be  misplaced,  BMS  credible  interval  forecasts 

may  misspecify  their  reliability,  and  that  the  predictive  distribution 

yielded  by  the  BMS  procedure  misspecifies  forecast-risk. 

In  Chapter  IV,  the  BMC  procedure  was  applied  to  single-period 
economic  control  problems.  In  particular,  certainty-equivalent  and 
optimal  analytic  solutions  were  found  for  simple  two-model  control 
problems  in  which  control  is  cost-free  and  in  which  instrument-use 
cost  functions  are  known. 

In  Chapter  V,  an  approach  to  handling  nonstationary  data-generating 
processes  was  introduced.  Called  Bayesian  Model  Switching,  the 
approach  models  the  random  variable  upon  which  a  decision  hinges  as 
being  generated  by  different  statistical  models  in  different  time 
periods  with  the  switch  between  models  described  by  a  multinomial 
process.  The  BMSW  methodology  can  be  used  to  make  inferential  state- 
ments about  nonstationary  processes  that  have  been  identified  as  fitting 
the  specifications  of  the  BMSW  model.  Unfortunately,  however,  the 
procedure  proves  to  be  computationally  unwieldy.  Even  for  the  straight- 
forward case  of  two  switching  normal  models,  the  probability  revision 
involved  becomes  computationally  burdensome  after  only  two  time  periods. 

The  remainder  of  this  chapter  will  be  devoted  to  discussions  of 
(1)  problems  encountered  in  the  course  of  the  research  for  this  disser- 
tation, (2)  the  shortcomings  of  the  BMC  procedure,  and  (3)  possible 


141 


extensions  to  the  work  of  this  dissertation  and  suggestions  for 
further  research  in  the  area  of  model  specification  uncertainty. 

VI. 1  Research  Difficulties  Encountered 
The  most  significant  difficulties  encountered  in  the  research  for 
this  dissertation  arose  in  the  application  of  BMC  control  to  complex 
model  spaces,  and  in  the  probability  revision  scheme  of  the  Bayesian 
Model  Switching  methodology.  These  difficulties  are  discussed  in  this 
section. 

As  noted  in  Chapter  IV,  analytic  BMC  control  solutions,  whether 
certainty-equivalent  or  optimal,  may  not  be  easy  to  obtain.  Diffi- 
culties arise  when  model  spaces  are  utilized  that  include  one  or  more 
models  in  which  the  target  variable  is  characterized  as  being  a 
function  of  two  or  more  instruments.   In  such  cases,  BMC  control  solu- 
tions can  only  be  obtained  by  solving  simultaneously  a  set  of  possibly 
wery   complex  equations.  The  reason  for  this  can  be  seen  by  examining 
the  general  form  of  the  BMC  control  problem  for  the  following  model 
space: 


M]:  y  =  ^X  +  e   ;  (6.1) 

M£:  y  =  62X  +  B3Z  +  6  .  (6.2) 

The  optimal  BMC  control  settings  for  Xp  and  Zp  would  be  obtained  by 
solving: 
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in  [P'^ly.X)  /  L(yF,yF*)f(yF|M1,y,X,XF)dyF 


+  P"(M2|y,X,Z)  /  L(yF,yF*)f(yF|M2,y,X,XF,Z,ZF)dyF].      (6.3) 

Refer  to  Chapter  IV  for  definitions  of  the  terms  in  (6.1),  (6.2),  and 
(6.3).  Both  terms  in  the  sum  of  (6.3)  depend  on  XF.  Accordingly,  the 
minimization  problem  cannot  be  separated  into  two  independent  minimi- 
zation problems  as  was  done  repeatedly  in  Chapter  IV,  but  must  be 
solved  by  taking  partial  derivatives  of  the  entire  expression  in 
brackets,  setting  the  resulting  derivatives  equal  to  zero,  and  solving 
the  resulting  equations  simultaneously  for  Xp  and  Zp.  For  the  model 
space  given  above,  this  solution  is  not  difficult  to  obtain.  However, 
when  interaction  or  other  higher  order  terms  are  included  in  one  or 
more  of  the  models,  the  solution  may  be  computationally  quite  complex, 
perhaps  involving  the  finding  of  the  roots  of  a  higher  order  poly- 
nomial. Even  when  solutions  to  more  complex  control  problems  can  be 
efficiently  obtained  via  the  BMC  procedure,  such  solutions  frequently 
result  in  awkward  analytical  expressions  for  the  instruments,  making 
their  policy  implications  difficult  to  ascertain. 

In  Chapter  V,  it  was  explained  that  the  computational  complexities 
of  the  BMSW  procedure  are  the  result  of  the  special  nature  of  the  like- 
lihood function  used  in  the  BMSW  probability  revision  process.  The 
likelihood  function  is  dependent  on  the  previous  period's  revision 
results  and,  consequently,  expands  with  each  succeeding  revision.  This 
dependence  occurs  because  the  BMSW  likelihood  function  is  designed  to 
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reflect  the  likelihood  of  all  n  periods  of  data  observed  to  date 
having  been  generated  by  any  possible  sequence  of  the  switching  models. 
The  need  for  such  a  complex  likelihood  function  arises  because  of  the 
decision-maker's  inability  to  identify  which  of  the  switching  models 
has  generated  the  observed  realizations  of  the  data-generating  process. 
The  computational  inefficiencies  of  the  BMSW  procedure  are,  therefore, 
a  result  of  the  way  in  which  the  BMSW  methodology  treats  the  identifi- 
cation problem. 

The  next  section  discusses  reasons  for  the  limited  applicability 
of  the  BMC  procedure  and,  in  so  doing,  points  out  areas  in  which 
further  research  is  needed. 

VI. 2  Shortcomings  of  the  BMC  Procedure  and  Suggestions 
for  Further  Research 

Two  of  the  major  difficulties  encountered  in  the  application  of 
the  BMC  procedure  are  technical  limitations  which  are  common  to  Bayesian 
inference  in  general.  The  first  has  to  do  with  the  assessment  of  prior 
probability  distributions;  the  second  with  the  computational  methods 
required  for  performing  probability  revision. 

If  a  decision  maker  has  substantial  prior  information  in  the  form 
of  his  own  judgments  or  experience,  it  is  important  that  this  informa- 
tion be  reflected  in  his  prior  probability  distribution  over  the  models, 
P'(M.)  i=l,...,  N,  and  his  prior  probability  distribution  over  the 
parameters,  g ' (e - | M . )  i=l,...,  N.  The  assessment  of  judgmental  priors, 
however,  for  problems  involving  many  parameters  is  a  considerable  task. 
Even  in  situations  in  which  the  decision  maker  does  not  have  prior 


144 


information,  as  Gaver  and  Geisel  have  pointed  out,  the  choice  of  an 
information! ess  prior  may  be  quite  difficult  and  the  results  may  be 
biased  in  favor  of  the  model  with  the  most  parameters.  Probability 
assessment  for  problems  of  this  nature  appears  to  be  an  area  deserving 
substantial  research. 

In  applying  the  BMC  procedure,  probability  revision  must  be  per- 
formed many  times.  With  each  successive  set  of  sample  observations 
obtained,  revision  is  performed  on  each  model's  prior  parameter  distri- 
bution and  on  each  prior  model  probability.  Currently,  however, 
computationally  efficient  methods  of  performing  this  revision  exist 
for  only  a  relatively  few  classes  of  statistical  models.  Thus,  in 
order  for  the  BMC  procedure  to  become  more  generally  applicable,  work 
is  needed  in  expanding  the  classes  of  models  which  can  be  conveniently 
handled  by  Bayesian  inferential  procedures. 

The  form  of  the  predictive  distribution  yielded  by  the  BMC  pro- 
cedure can  also  make  the  application  of  the  procedure  computationally 
inconvenient.  Recall  from  equation  (2.15)  that  the  BMMP  is  a  mixture 
--  a  linear  combination  --  of  single-model  predictive  distributions. 
Accordingly,  the  BMMP  is  quite  difficult  to  display  graphically  and 
numerical  methods  may  be  required  to  find  the  probability  of  y^   taking 
on  a  value  in  a  particular  region;  i.e.,  to  find  P[a  -  Yp  -  b],  where 
a  and  b  are  constants.  Thus,  a  significant  amount  of  effort  may  be 
required  to  find  credible  intervals  and,  depending  on  the  decision- 
maker's loss  function,  point  estimates  for  y^- 


Gaver  and  Geisel,  pp.  62-72. 
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The  BMC  procedure  is  also  limited  in  its  applicability  by  the 
requirement  that  the  decision  maker  must  assign  prior  model  probabili- 
ties as  if  he  believes  the  correct  model  of  the  data-generating  process 
is  represented  in  his  model  space.  No  formal  provision  exists  within 
the  BMC  procedure  for  the  decision  maker  to  express  doubt  concerning 
whether  his  model  space  contains  the  correct  model.  This  then  is  a 
weakness  in  both  the  BMC  and  BMS  procedures,  for  certainly  most 
decision  makers  would  admit  to  some  doubt  concerning  the  existence  of 
the  correct  model  in  a  particular  model  space  no  matter  how  extensive 
the  model  space.  What  is  needed  is  a  procedure  for  making  inferences 
about  a  set  of  N  models  comprised  of  N-l  well-defined  statistical 
models  and  one  ill-defined  dummy  model  representing  the  infinitely 
many  models  not  explicitly  included  in  the  set  of  N-l  models.  Until 
such  a  procedure  is  available,  the  meaningfulness  of  the  prior  model 
probabilities  and,  therefore,  the  results  of  the  BMC  procedure  are 
dependent  on  the  decision  maker  choosing  his  model  space  so  as  to  make 
virtually  insignificant  his  doubt  concerning  whether  he  has  in  fact 
captured  the  correct  model  of  the  data-generating  process  in  his  model 
space. 

The  next  section  offers  suggestions  for  further  research  in  the 
areas  of  economic  control  and  model  nonstationarity. 

VI. 3  Suggestions  for  Further  Research  in  the  Areas  of 
Economic  Control  and  Model  Nonstationarity 

Several  important  questions  concerning  BMC  control  solutions 

remain  unanswered.  These  questions,  along  with  suggestions  for 
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further  research  in  the  area  of  model  nonstationarity,  are  outlined 
in  this  section. 

The  details  of  the  BMC  approach  to  handling  model  specification 
uncertainty  in  economic  control  problems  were  presented  in  Chapter  IV. 
Now  what  is  needed  are  answers  to  the  following  questions: 

(1)  Is  the  BMC  approach  to  control  equivalent  to  the  usual 
Bayesian  approach  in  which  all  N  alternative  models  are 
nested  in  a  single  model? 

(2)  A  popular  classical  approach  to  control  involves  including 
all  instruments  thought  to  be  related  to  the  target  variable 
in  a  single  model,  estimating  the  parameters  of  that  model, 
and  testing  hypotheses  about  the  model  parameters  in  order 
to  establish  a  single  model  of  the  process  whose  control 

is  desired.  The  question  is:  How  does  the  BMC  approach 
to  control  perform  relative  to  this  classical  approach? 

(3)  Is  there  a  problem  in  logically  assessing  prior  model 
probabilities  for,  say,  three  models  when  the  third  model 
is  a  nest  of  the  first  two  models? 

If  BMC  control  procedures  compare  favorably  with  other  Bayesian  and 
classical  approaches,  solutions  should  be  derived  and  their  policy- 
making implications  explored  for  a  variety  of  model  spaces,  i.e.,  for 
model  spaces  of  differing  dimensions  containing  models  of  differing 
functional  form. 

Concerning  suggestions  for  further  study  of  model  nonstationarity 
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via  the  BMSW  approach,  the  next  step  should  involve  either  the  develop- 
ment of  models  with  simpler  likelihood  functions  or  a  search  for  asymp- 
totic properties  of  BMSW  probability  revision  that  would  at  least  permit 
BMSW  inferential  statements  to  be  approximated.  Assuming  the  computa- 
tional inefficiencies  of  the  BMSW  procedure  can  be  overcome,  other 
specific  forms  of  nonstationarity  (recall  that  in  Chapter  V,  the 
switching  models  were  controlled  by  a  multinomial  process)  should  be 
considered  in  the  context  of  BMSW.  For  example,  a  Markov  process  might 
be  utilized  to  control  the  switches  between  models,  or  perhaps  the 
model  switches  might  be  controlled  by  a  Bernoulli  process  with  an 
exponential  process  dictating  the  time  the  switch  is  to  occur.  The 
development  of  BMSW  methodology  for  Bernoulli  and  Markov  switching 
control  processes  in  which  the  transition  probabilities  are  unknown  would 
help  add  even  more  realism  to  the  modeling  of  nonstationary  processes. 
Whether  or  not  the  computational  inefficiencies  of  the  BMSW  methodology 
can  be  overcome,  it  would  seem  sensible  to  approach  the  study  of  non- 
stationary  data-generating  processes  by  initially  examining  highly 
specific  forms  of  nonstationarity  as  was  done  in  Chapter  V  and  suggested 
above. 
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